Hi Chalitha, The only documents I see here are documents that Tika cannot extract content from, namely JPG's etc.
Karl On Fri, Jul 17, 2015 at 12:09 PM, chalitha udara Perera < [email protected]> wrote: > Hi Karl, > > Here I have attached the result from File System -> Tika Transform -> Null > Output. > Please find the attachment. > > Thank you, > Chalitha > > On Fri, Jul 17, 2015 at 6:41 PM, Karl Wright <[email protected]> wrote: > >> I don't see this here. >> >> I set up the following: >> - file system repository connection >> - null output connection >> - tika extractor >> - a job using all three >> >> Running the job and looking at the simple history, I see null output >> connection ingestion records that have proper document sizes. >> >> Can you repeat the same setup there, and tell me what you get? >> >> Thanks, >> Karl >> >> Sent from my Windows Phone >> ------------------------------ >> From: chalitha udara Perera >> Sent: 7/17/2015 8:46 AM >> To: Karl Wright >> Cc: [email protected] >> Subject: Re: Repository document stream empty after Tika Transformation >> >> Hi Karl, >> >> I'm using 2.1 release and I am using only the Solr output connector. If >> you look at the inputstream size ( >> document.getBinaryLength()) after tika connector it is zero. >> >> Thanks, >> Chalitha >> >> On Fri, Jul 17, 2015 at 6:08 PM, Karl Wright <[email protected]> wrote: >> >>> The document stream contains what tika extracts. If it can't extract >>> anything then you will have an empty stream. >>> >>> It is also possible that if the stream is split, you are tripping over a >>> bug that was fixed some time ago. What mcf version is this, and do you >>> have more than one output? >>> >>> Karl >>> >>> Sent from my Windows Phone >>> ------------------------------ >>> From: chalitha udara Perera >>> Sent: 7/17/2015 7:25 AM >>> To: [email protected] >>> Subject: Repository document stream empty after Tika Transformation >>> >>> Hi All, >>> >>> I'm writing a transformation connector to extract low level features >>> from images. First I used that connector without tika extractor and I >>> worked fine. But when I used it with Tika connector (after tika) if fails >>> to extract features. After debugging I found out that the stream is empty >>> after tika transformation. >>> Actually inside tika connector, it creates a new in memory or file >>> stream output, but original input stream is never copied to it. Connector >>> should reset binary stream after utilizing the stream to get metadata so >>> the original inputstream is available from connector to connector. >>> >>> Here I have attached a simple solution of stream copy and reset that >>> worked for me. >>> >>> Thanks, >>> Chalitha >>> >>> -- >>> J.M Chalitha Udara Perera >>> >>> *Department of Computer Science and Engineering,* >>> *University of Moratuwa,* >>> *Sri Lanka* >>> >> >> >> >> -- >> J.M Chalitha Udara Perera >> >> *Department of Computer Science and Engineering,* >> *University of Moratuwa,* >> *Sri Lanka* >> > > > > -- > J.M Chalitha Udara Perera > > *Department of Computer Science and Engineering,* > *University of Moratuwa,* > *Sri Lanka* >
