[ https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748103#action_12748103 ]
Tim Smith commented on LUCENE-1859: ----------------------------------- bq. Death by a thousand cuts. This is one cut. by this logic, nothing new can ever be added. The thing that brought this to my attention was the new TokenStream API (one cut (rather big, but i like the new API so i'm happy with the blood loss (makes me dizzy and happy))) The new TokenStream API holds onto theses char[] much longer (if not forever), so this results in memory growing unbounded unless there is some facility to truncate/null out the char[] bq. I wouldn't even add the note to the documentation. I don't believe there is ever any valid argument against adding documentation. If someone can shoot themselves in the foot with the gun you gave them, at least tell them not to point the gun at their foot with the safety off. bq. The only reason to do this is to keep average memory usage down for the hell of it. keeping average memory usage down prevents those wonderful OutOfMemory Exceptions (which are difficult at best to recover from) > TermAttributeImpl's buffer will never "shrink" if it grows too big > ------------------------------------------------------------------ > > Key: LUCENE-1859 > URL: https://issues.apache.org/jira/browse/LUCENE-1859 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 2.9 > Reporter: Tim Smith > Priority: Minor > > This was also an issue with Token previously as well > If a TermAttributeImpl is populated with a very long buffer, it will never be > able to reclaim this memory > Obviously, it can be argued that Tokenizer's should never emit "large" > tokens, however it seems that the TermAttributeImpl should have a reasonable > static "MAX_BUFFER_SIZE" such that if the term buffer grows bigger than this, > it will shrink back down to this size once the next token smaller than > MAX_BUFFER_SIZE is set > I don't think i have actually encountered issues with this yet, however it > seems like if you have multiple indexing threads, you could end up with a > char[Integer.MAX_VALUE] per thread (in the very worst case scenario) > perhaps growTermBuffer should have the logic to shrink if the buffer is > currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org