The pipeline code itself is now "complete" in trunk.  Zaizi said they'd
contribute a Tika extractor transformation connector - and if they don't
get around to that in a month or so, I may take a crack at it myself.

As for changing the Solr connector so that it doesn't go to the extracting
update handler, it would be great if:
(1) Someone created a ticket for this, and
(2) A patch was provided that maintains backwards compatibility with
previous versions of the connector (so a checkbox would probably need to go
into the UI somewhere).  Do either of you want to start this process?

Thanks!
Karl



On Mon, Jun 16, 2014 at 12:37 PM, Karl Wright <[email protected]> wrote:

> Hi guys,
>
> You folks may not have looked at 1.7 yet, but it has a full pipeline, and
> is expected to have a Tika extractor as a transformation connector.
>
> Karl
>
>
>
> On Mon, Jun 16, 2014 at 11:14 AM, Matteo Grolla <[email protected]>
> wrote:
>
>> Thanks Alessandro,
>>         that explains the situation clearly.
>> And I agree that sending all the metadata as get parameter can be
>> problematic
>>
>> Cheers
>>
>> --
>> Matteo Grolla
>> Sourcesense - making sense of Open Source
>> http://www.sourcesense.com
>>
>> Il giorno 16/giu/2014, alle ore 17:09, Alessandro Benedetti ha scritto:
>>
>> > mmmm the point is that right now ManifoldCF has no extractors.
>> > The Repository connectors extracts directly the binary and there is no
>> > "Extractor Processor" yet.
>> > But recently a pipe-line processor architecture has been thought (
>> > https://issues.apache.org/jira/browse/CONNECTORS-959)
>> > So can fit there.
>> >
>> > Cheers
>> >
>> >
>> > 2014-06-16 15:59 GMT+01:00 Matteo Grolla <[email protected]>:
>> >
>> >> Since Solr extracting request handler takes the binary and extracts
>> text
>> >> what is the point of not using Manifold extractor and send text and
>> >> binaries to solr?
>> >> I mean the end result is the same solr indexes text and stores text
>> >> So if manifold supports text extraction it seems me this is the place
>> >> where it should be done
>> >>
>> >> --
>> >> Matteo Grolla
>> >> Sourcesense - making sense of Open Source
>> >> http://www.sourcesense.com
>> >>
>> >> Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales ha
>> >> scritto:
>> >>
>> >>> Hi Matteo
>> >>>
>> >>> Manifold already handles the extraction, but the only way to send
>> binary
>> >>> content and document metadata to Solr is using the update/extract
>> >> handler,
>> >>> where the metadata is sent as query parameters and the binary content
>> is
>> >>> sent in the body of the requests, allowing Solr to use Tika to obtain
>> the
>> >>> raw content to be stored in Solr.
>> >>>
>> >>> Regards
>> >>>
>> >>>
>> >>> On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla <
>> [email protected]
>> >>>
>> >>> wrote:
>> >>>
>> >>>> Hi During my first indexing I noticed that manifold uses Solr
>> extracting
>> >>>> request handler to extract the content of an xml file
>> >>>> For performance reasons it would be better if Manifold handled the
>> >>>> extraction letting Solr do the search engine
>> >>>> Is this because of the connector design, framework design or just to
>> be
>> >>>> done?
>> >>>>
>> >>>> --
>> >>>> Matteo Grolla
>> >>>> Sourcesense - making sense of Open Source
>> >>>> http://www.sourcesense.com
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>>
>> >>> ------------------------------
>> >>> This message should be regarded as confidential. If you have received
>> >> this
>> >>> email in error please notify the sender and destroy it immediately.
>> >>> Statements of intent shall only become binding when confirmed in hard
>> >> copy
>> >>> by an authorised signatory.
>> >>>
>> >>> Zaizi Ltd is registered in England and Wales with the registration
>> number
>> >>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>> Road,
>> >>> London W6 7AN.
>> >>
>> >>
>> >
>> >
>> > --
>> > --------------------------
>> >
>> > Benedetti Alessandro
>> > Visiting card : http://about.me/alessandro_benedetti
>> >
>> > "Tyger, tyger burning bright
>> > In the forests of the night,
>> > What immortal hand or eye
>> > Could frame thy fearful symmetry?"
>> >
>> > William Blake - Songs of Experience -1794 England
>>
>>
>

Reply via email to