Derek, Make your own JIRA, and link it to the one you mention below. Then this issue can potentially be tracked through to a commit if it goes that far.
Thx! Upayavira On Tue, Jun 9, 2015, at 08:14 AM, Derek Wood wrote: > I found a bug in the LangDetect implementation of language detection, > where the > maxTotalChars property isn't doing what it's description says it does: > Solr uses > the append() method solely in the LangDetect library, which checks the > string > length of the text to be appended and not its entire contents [1]. > > I've got a patch (attached) that solves this issue and hoists out a few > of the > utility methods in the Tika implementation and reuses them in the > LangDetect > one, but I stumbled upon SOLR-3881 [2], where the methods (concatFields > and > getExpectedSize specifically) were taken out of the parent class for > reasons > that are sort of unclear from the comments. > > Could I get some historical context on the issue and feedback on my > patch? > Thanks > > [1] > https://github.com/shuyo/language-detection/blob/master/src/com/cybozu/labs/langdetect/Detector.java#L170 > [2] https://issues.apache.org/jira/browse/SOLR-3881 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > Email had 1 attachment: > + langdetect-fix.patch > 8k (text/plain) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org