Jukka Zitting
Mon, 25 Jan 2010 11:49:42 -0800
Hi, On Fri, Jan 22, 2010 at 7:59 PM, Baldwin, David <david_bald...@bmc.com> wrote: > I want to make sure that I am really running in streaming mode. I am doing > all tests > with 1 thread for a basic baseline memory usage for different documents, then > I will > work on multiple threads which should be close to n multiples, I would expect. > > Can you tell me if streaming mode is more than just using the InputStream to > Tika? Yes, you'll want to stream also the parse output. You can do this by either processing the SAX events directly as they come, or by using the ParsingReader class (or the new Tika.parse() methods in Tika 0.5 and higher). The problem with your code is that it's buffering the entire text content of the document you're parsing into a single String. BR, Jukka Zitting