clintropolis opened a new pull request, #15083:
URL: https://github.com/apache/druid/pull/15083
### Description
Fixes a case missed in #14422, where paths with a mix of only a single type
of scalar value and empty arrays would incorrectly get treated as single type
scalar columns, resulting in serialization failure with errors of the form:
```
org.apache.druid.error.DruidException: Value not found in string dictionary
at
org.apache.druid.error.DruidException$DruidExceptionBuilder.build(DruidException.java:460)
at
org.apache.druid.error.DruidException$DruidExceptionBuilder.build(DruidException.java:450)
at
org.apache.druid.error.DruidException.defensive(DruidException.java:176)
at
org.apache.druid.segment.nested.DictionaryIdLookup.lookupString(DictionaryIdLookup.java:130)
at
org.apache.druid.segment.nested.ScalarStringFieldColumnWriter.lookupGlobalId(ScalarStringFieldColumnWriter.java:58)
at
org.apache.druid.segment.nested.ScalarStringFieldColumnWriter.lookupGlobalId(ScalarStringFieldColumnWriter.java:33)
at
org.apache.druid.segment.nested.GlobalDictionaryEncodedFieldColumnWriter.addValue(GlobalDictionaryEncodedFieldColumnWriter.java:154)
at
org.apache.druid.segment.nested.NestedDataColumnSerializer$1.processArrayField(NestedDataColumnSerializer.java:132)
```
Note that the exception is being thrown from `processArrayField` but the
writer is a `ScalarStringFieldColumnWriter`. This could happen with any
combination of scalar type if only encountered empty arrays and no other array
values. The problem occurred because the 'type byte' of `FieldTypeInfo` is used
to check for "single type" fields to optimize, but was forgetting to check for
the presence of the empty array flag. Empty arrays do not set a type at the
time of indexing since they don't really have a type similar to nulls, so as to
not pollute the type byte with an artificially chosen type. The intention was
that at persist type we check for the flag and add an array type of the scalar
type if otherwise a single type, to promote it to being mixed type and preserve
the array dictionary, or just remain a single type if the single type is a
typed array type.
The added column to the test data triggered this issue in a number of tests,
and I also added some tests for the empty array flag for the field type info
tests.
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]