The input in the error message starts “lorem ipsum”, so it contains spaces, but 
the alphaOnlySort field type (in Solr’s example schemas anyway) uses 
KeywordTokenizer, which tokenizes the entire input as a single token.

As Erick implied, you maybe should not be doing that with this kind of data - 
perhaps the analyzer used by this dynamic field should change?

Alternatively, you could:

a) truncate long values so that a prefix makes it through the indexing process, 
e.g. by adding TruncateTokenFilterFactory[1] to alphaOnlySort’s analyzer, or by 
adding TruncateFieldUpdateProcessorFactory[2] to your update request processor 
chain; or

b) entirely eliminate overly long values, e.g. using LengthFilterFactory[3].

[1] 
https://lucene.apache.org/core/7_3_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilterFactory.html
[2] 
https://lucene.apache.org/solr/7_3_0/solr-core/org/apache/solr/update/processor/TruncateFieldUpdateProcessorFactory.html
[3] 
https://lucene.apache.org/solr/guide/7_3/filter-descriptions.html#length-filter

--
Steve
www.lucidworks.com

> On May 1, 2018, at 11:28 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> You're sending it a huge term. My guess is you're sending something
> like base64-encoded data or perhaps just a single unbroken string in
> your field.
> 
> Examine your document, it should jump out at you.
> 
> Best,
> Erick
> 
> On Tue, May 1, 2018 at 7:40 AM, THADC <timothy.clotworthy.j...@gmail.com> 
> wrote:
>> Hello,
>> 
>> We are migrating from solr 4.7 to 7.3. When I encounter a data item that
>> matches a custom dynamic field from our 4.7 schema:
>> 
>> *<dynamicField name="*_tsing"  type="alphaOnlySort"    indexed="true"
>> stored="true" multiValued="false"/>*
>> 
>> , I get the following exception:
>> 
>> *Exception writing document id FULL_36265 to the index; possible analysis
>> error: Document contains at least one immense term in
>> field="gridFacts_tsing" (whose UTF8 encoding is longer than the max length
>> 32766), all of which were skipped.  Please correct the analyzer to not
>> produce such terms.  The prefix of the first immense term is: '[108, 111,
>> 114, 101, 109, 32, 105, 112, 115, 117, 109, 32, 100, 111, 108, 111, 114, 32,
>> 115, 105, 116, 32, 97, 109, 101, 116, 44, 32, 99, 111]...', original
>> message: bytes can be at most 32766 in length; got 68144.*
>> 
>> Any ideas are greatly appreciated. Thank you.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to