clintropolis opened a new pull request, #15083:
URL: https://github.com/apache/druid/pull/15083

   ### Description
   Fixes a case missed in #14422, where paths with a mix of only a single type 
of scalar value and empty arrays would incorrectly get treated as single type 
scalar columns, resulting in serialization failure with errors of the form:
   
   ```
   org.apache.druid.error.DruidException: Value not found in string dictionary
   
        at 
org.apache.druid.error.DruidException$DruidExceptionBuilder.build(DruidException.java:460)
        at 
org.apache.druid.error.DruidException$DruidExceptionBuilder.build(DruidException.java:450)
        at 
org.apache.druid.error.DruidException.defensive(DruidException.java:176)
        at 
org.apache.druid.segment.nested.DictionaryIdLookup.lookupString(DictionaryIdLookup.java:130)
        at 
org.apache.druid.segment.nested.ScalarStringFieldColumnWriter.lookupGlobalId(ScalarStringFieldColumnWriter.java:58)
        at 
org.apache.druid.segment.nested.ScalarStringFieldColumnWriter.lookupGlobalId(ScalarStringFieldColumnWriter.java:33)
        at 
org.apache.druid.segment.nested.GlobalDictionaryEncodedFieldColumnWriter.addValue(GlobalDictionaryEncodedFieldColumnWriter.java:154)
        at 
org.apache.druid.segment.nested.NestedDataColumnSerializer$1.processArrayField(NestedDataColumnSerializer.java:132)
   ```
   
   Note that the exception is being thrown from `processArrayField` but the 
writer is a `ScalarStringFieldColumnWriter`. This could happen with any 
combination of scalar type if only encountered empty arrays and no other array 
values. The problem occurred because the 'type byte' of `FieldTypeInfo` is used 
to check for "single type" fields to optimize, but was forgetting to check for 
the presence of the empty array flag. Empty arrays do not set a type at the 
time of indexing since they don't really have a type similar to nulls, so as to 
not pollute the type byte with an artificially chosen type. The intention was 
that at persist type we check for the flag and add an array type of the scalar 
type if otherwise a single type, to promote it to being mixed type and preserve 
the array dictionary, or just remain a single type if the single type is a 
typed array type.
   
   The added column to the test data triggered this issue in a number of tests, 
and I also added some tests for the empty array flag for the field type info 
tests.
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to