[
https://issues.apache.org/jira/browse/LUCENE-6814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907036#comment-14907036
]
Michael McCandless commented on LUCENE-6814:
--------------------------------------------
bq. Is 4.10 dead now?
It's not that it's dead, it's more that there would need to be a really really
bad bug to warrant another 4.10.x release at this point. We are already well
into 5.x, and just released 5.3.1 bug fix today:
http://lucene.apache.org/#24-september-2015-apache-lucene-531-and-apache-solr-531-available
bq. I'll probably have to push for the PatternTokenizer fork to get it into 1.x?
Yes, maybe ... or patch your ES locally with the change?
bq. and PatternTokenizer ends up retaining around 2gb
Hmm, how many {{PatternTokenizer}} instances do you have? With 24 bulk
indexing threads you should (I think?) only have at most 24 instances * 4 MB
max should be ~100 MB.
bq. Does everyone just like huge heaps (or just not use PatternAnalyzer)?
I suspect {{PatternTokenizer}} is not commonly used ...
There is a TODO in the code to fix this class to not hold an entire copy of the
document ...
bq. I think this ends up growing to the max sized field.
OK I'll put that in the CHANGES when I commit, and fix this issue title.
> PatternTokenizer should free heap after it's done
> -------------------------------------------------
>
> Key: LUCENE-6814
> URL: https://issues.apache.org/jira/browse/LUCENE-6814
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: Trunk, 5.4
>
> Attachments: LUCENE-6814.patch, LUCENE-6814.patch
>
>
> Caught by Alex Chow in this Elasticsearch issue:
> https://github.com/elastic/elasticsearch/issues/13721
> Today, PatternTokenizer reuses a single StringBuilder, but it doesn't free
> its heap usage after tokenizing is done. We can either stop reusing, or ask
> it to {{.trimToSize}} when we are done ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]