Nikolas Osvalds created SOLR-16589:
--------------------------------------

             Summary: Large fields with large="true" can be truncated in v9+
                 Key: SOLR-16589
                 URL: https://issues.apache.org/jira/browse/SOLR-16589
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: search
    Affects Versions: 9.0, 9.1, 9.2
            Reporter: Nikolas Osvalds


### Description

## Issue

For fields using `large="true"`, large fields (which is what they are intended 
for) can be truncated in v9+ of Lucene.

Example fieldtype definition:
```
<fieldtype name="string_large"  class="solr.TextField" multiValued="false" 
indexed="false" stored="true" omitNorms="true" large="true" />
```

## Cause
Looks like this is a bug introduced along with 
[LUCENE-8805](https://issues.apache.org/jira/browse/LUCENE-8805) / 
https://github.com/apache/lucene/issues/9849:


[https://github.com/apache/lucene/blob/5a694ea26ff862ecc874ca798135073d300c2234/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L462-L465|https://github.com/apache/solr/blob/bc2d9623f7960f83636eb8416b11dd4e91ab4b22/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L508-L511]

 

```

                  public void stringField(FieldInfo fieldInfo, String value) 
throws IOException {
                    Objects.requireNonNull(value, "String value should not be 
null");
                    bytesRef.bytes = value.getBytes(StandardCharsets.UTF_8);
                    bytesRef.length = value.length();

```

 

Specifically with respect to "large" fields handling.

The length in utf8 bytes will often be longer than the string length 
`value.length()`, hence the truncation.

## Fix

The Fix would be:

`bytesRef.length = bytesRef.bytes.length`

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to