Thanks Alessandro,
that explains the situation clearly.
And I agree that sending all the metadata as get parameter can be problematic
Cheers
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
Il giorno 16/giu/2014, alle ore 17:09, Alessandro Benedetti ha scritto:
> mmmm the point is that right now ManifoldCF has no extractors.
> The Repository connectors extracts directly the binary and there is no
> "Extractor Processor" yet.
> But recently a pipe-line processor architecture has been thought (
> https://issues.apache.org/jira/browse/CONNECTORS-959)
> So can fit there.
>
> Cheers
>
>
> 2014-06-16 15:59 GMT+01:00 Matteo Grolla <[email protected]>:
>
>> Since Solr extracting request handler takes the binary and extracts text
>> what is the point of not using Manifold extractor and send text and
>> binaries to solr?
>> I mean the end result is the same solr indexes text and stores text
>> So if manifold supports text extraction it seems me this is the place
>> where it should be done
>>
>> --
>> Matteo Grolla
>> Sourcesense - making sense of Open Source
>> http://www.sourcesense.com
>>
>> Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales ha
>> scritto:
>>
>>> Hi Matteo
>>>
>>> Manifold already handles the extraction, but the only way to send binary
>>> content and document metadata to Solr is using the update/extract
>> handler,
>>> where the metadata is sent as query parameters and the binary content is
>>> sent in the body of the requests, allowing Solr to use Tika to obtain the
>>> raw content to be stored in Solr.
>>>
>>> Regards
>>>
>>>
>>> On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla <[email protected]
>>>
>>> wrote:
>>>
>>>> Hi During my first indexing I noticed that manifold uses Solr extracting
>>>> request handler to extract the content of an xml file
>>>> For performance reasons it would be better if Manifold handled the
>>>> extraction letting Solr do the search engine
>>>> Is this because of the connector design, framework design or just to be
>>>> done?
>>>>
>>>> --
>>>> Matteo Grolla
>>>> Sourcesense - making sense of Open Source
>>>> http://www.sourcesense.com
>>>>
>>>>
>>>
>>> --
>>>
>>> ------------------------------
>>> This message should be regarded as confidential. If you have received
>> this
>>> email in error please notify the sender and destroy it immediately.
>>> Statements of intent shall only become binding when confirmed in hard
>> copy
>>> by an authorised signatory.
>>>
>>> Zaizi Ltd is registered in England and Wales with the registration number
>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>> London W6 7AN.
>>
>>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England