Re: Repository document stream empty after Tika Transformation

chalitha udara Perera Fri, 17 Jul 2015 06:14:04 -0700

Hi Karl,

I'm using 2.1 release  and I am using only the Solr output connector. If
you look at the inputstream size (
   document.getBinaryLength()) after tika connector it is zero.


Thanks,
Chalitha

On Fri, Jul 17, 2015 at 6:08 PM, Karl Wright <[email protected]> wrote:

> The document stream contains what tika extracts.  If it can't extract
> anything then you will have an empty stream.
>
> It is also possible that if the stream is split, you are tripping over a
> bug that was fixed some time ago.  What mcf version is this, and do you
> have more than one output?
>
> Karl
>
> Sent from my Windows Phone
> ------------------------------
> From: chalitha udara Perera
> Sent: 7/17/2015 7:25 AM
> To: [email protected]
> Subject: Repository document stream empty after Tika Transformation
>
> Hi All,
>
> I'm writing a transformation connector to extract low level features from
> images. First I used that connector without tika extractor and I worked
> fine. But when I used it with Tika connector (after tika) if fails to
> extract features. After debugging I found out that the stream is empty
> after tika transformation.
> Actually inside tika connector, it creates a new in memory or file stream
> output, but original input stream is never copied to it. Connector should
> reset binary stream after utilizing the stream to get metadata so the
> original inputstream is available from connector to connector.
>
> Here I have attached a simple solution of stream copy and reset that
> worked for me.
>
> Thanks,
> Chalitha
>
> --
> J.M Chalitha Udara Perera
>
> *Department of Computer Science and Engineering,*
> *University of Moratuwa,*
> *Sri Lanka*
>



-- 
J.M Chalitha Udara Perera

*Department of Computer Science and Engineering,*
*University of Moratuwa,*
*Sri Lanka*

Re: Repository document stream empty after Tika Transformation

Reply via email to