: Setting writer.setMaxFieldLength(5000) (default is 10000)
: seems to eliminate the risk for an OutOfMemoryError,

that's because it now gives up after parsing 5000 tokens.

: To me, it appears that simply calling
:    new Field("content", new InputStreamReader(in, "ISO-8859-1"))
: on a plain text file causes Lucene to buffer it *all*.

Looking at this purely from an outside in perspective: how could that
be true? If it was then why would calling setMaxFieldLength(5000) solve your problem -- limiting the number of tokens wouldn't matter if the problem occured becuase Lucene was buffering the entire reader.


It definitely seems like there is some room for improvement here ... it sounds almost like mayber there is a [HAND WAVEY AIR QUOTES] memory/object leakish [/HAND WAVEY AIR QUOTES] situation where even after a Token is read off the TokenStream the Token isn't being GCed.

Per: perhaps you could open a Jira issue and attach a unit test demonstrating the problem? maybe something with an artificial Reader that just churns out a repeating sequence of characters forever?




-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to