Derek,

Make your own JIRA, and link it to the one you mention below. Then this
issue can potentially be tracked through to a commit if it goes that
far.

Thx!

Upayavira

On Tue, Jun 9, 2015, at 08:14 AM, Derek Wood wrote:
> I found a bug in the LangDetect implementation of language detection,
> where the
> maxTotalChars property isn't doing what it's description says it does:
> Solr uses
> the append() method solely in the LangDetect library, which checks the
> string
> length of the text to be appended and not its entire contents [1].
> 
> I've got a patch (attached) that solves this issue and hoists out a few
> of the
> utility methods in the Tika implementation and reuses them in the
> LangDetect
> one, but I stumbled upon SOLR-3881 [2], where the methods (concatFields
> and
> getExpectedSize specifically) were taken out of the parent class for
> reasons
> that are sort of unclear from the comments.
> 
> Could I get some historical context on the issue and feedback on my
> patch?
> Thanks
> 
> [1]
> https://github.com/shuyo/language-detection/blob/master/src/com/cybozu/labs/langdetect/Detector.java#L170
> [2] https://issues.apache.org/jira/browse/SOLR-3881
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> Email had 1 attachment:
> + langdetect-fix.patch
>   8k (text/plain)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to