[
https://issues.apache.org/jira/browse/SOLR-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cao Manh Dat updated SOLR-8495:
-------------------------------
Attachment: SOLR-8495.patch
Here are the initial patch for this issue, It based on the idea #1 of
[~steve_rowe]
This patch introduce new {{ParseLongStringFieldUpdateProcessorFactory}} which
do the check
{code}
if (valSize > 32000) {
return new LongStringField(stringVal);
}
{code}
So we can add new type mapping to {{AddSchemaFieldsUpdateProcessorFactory}}
{code}
<lst name="typeMapping">
<str name="valueClass">org.apache.solr.update.processor.LongStringField</str>
<str name="fieldType">lstring</str>
</lst>
{code}
There are some problems of this approach is :
- We must define the size of chunk ( in which we split large string into )
inside schema file ( for {{ChunkTokenizerFactory}} ) not inside solrconfig.
- In multi-value case, what should we do for case the first value is > 32kb and
the second value is < 32kb? With this patch, first value is mapping into
LongStringField and second value still a String, so
{{AddSchemaFieldsUpdateProcessor#mapValueClassesToFieldType}} will create a
field based on {{defaultFieldType}} ( should we modify the method? )
> Schemaless mode cannot index large text fields
> ----------------------------------------------
>
> Key: SOLR-8495
> URL: https://issues.apache.org/jira/browse/SOLR-8495
> Project: Solr
> Issue Type: Bug
> Components: Data-driven Schema, Schema and Analysis
> Affects Versions: 4.10.4, 5.3.1, 5.4
> Reporter: Shalin Shekhar Mangar
> Labels: difficulty-easy, impact-medium
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-8495.patch
>
>
> The schemaless mode by default indexes all string fields into an indexed
> StrField which is limited to 32KB text. Anything larger than that leads to an
> exception during analysis.
> {code}
> Caused by: java.lang.IllegalArgumentException: Document contains at least one
> immense term in field="text" (whose UTF8 encoding is longer than the max
> length 32766)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]