tika-user  

Re: Memory Usage/needs for file sizes/types

Jukka Zitting
Mon, 25 Jan 2010 11:49:42 -0800

Hi,

On Fri, Jan 22, 2010 at 7:59 PM, Baldwin, David <david_bald...@bmc.com> wrote:
> I want to make sure that I am really running in streaming mode.  I am doing 
> all tests
> with 1 thread for a basic baseline memory usage for different documents, then 
> I will
> work on multiple threads which should be close to n multiples, I would expect.
>
> Can you tell me if streaming mode is more than just using the InputStream to 
> Tika?

Yes, you'll want to stream also the parse output. You can do this by
either processing the SAX events directly as they come, or by using
the ParsingReader class (or the new Tika.parse() methods in Tika 0.5
and higher).

The problem with your code is that it's buffering the entire text
content of the document you're parsing into a single String.

BR,

Jukka Zitting