Re: encoding is longer than the max length 32766

Andrew Mehler Tue, 01 Jul 2014 12:23:27 -0700

For not analyzed fields, Is there a way of capturing the old behavior? 
 From what I can tell, you need to specify a tokenizer to have a token 
filter.


On Tuesday, June 3, 2014 12:18:37 PM UTC-4, Karel Minařík wrote:
>
> This is actually a change in Lucene -- previously, the long term was 
> silently dropped, now it raises an exception, see Lucene ticket 
> https://issues.apache.org/jira/browse/LUCENE-5710
>
> You might want to add a `length` filter to your analyzer (
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-length-tokenfilter.html#analysis-length-tokenfilter
> ).
>
> All in all, it hints at some strange data, because such "immense" term 
> shouldn't probably be in the index in the first place.
>
> Karel
>
> On Thursday, May 29, 2014 10:47:37 PM UTC+2, Jeff Dupont wrote:
>>
>> We’re running into a peculiar issue when updating indexes with content 
>> for the document.
>>
>>
>> "document contains at least one immense term in (whose utf8 encoding is 
>> longer than the max length 32766), all of which were skipped. please 
>> correct the analyzer to not produce such terms”
>>
>>
>> I’m hoping that there’s a simple fix or setting that can resolve this.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/26e3ad78-65a3-4853-ad26-8836c7bc2c7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: encoding is longer than the max length 32766

Reply via email to