mcbrewster opened a new pull request #9803:
URL: https://github.com/apache/druid/pull/9803
Fixes #9485
### Description
Currently, any columns with all null values are not processed by the
sampler. There should be a parameter on the IOConfig that allows the user to
specify that that they to keep columns with all null values. This feature
should be implemented in the web-console.
I am pretty new to this so my approach might be hilariously wrong, just know
I know.
The problem is happening because columns with all null column values are
treated as stringDimensions, and then fail the canSkip() in
StringDimensionHandler and are subsequently skipped. To fix this, I added the
JsonProperty disableNullColumnSkipping to the IOConfig and created the getter
isDisableNullColumnSkipping(). If the values is not set then it will default to
false which means null columns will be skipped and there should be no change to
the current functionality.
The values disableNullColumnSkipping is used in two places. It is used in
stringDimensionMergerV6 CanSkip() so that it will always return false if
disableNullColumnSkipping. Secondly, it is used in JsonReader to prevent the
flattening of an inline datasource, this is necessary because flattening it
will literally remove the null columns from the row so they never even end up
in the incrementalIndex.
The rest of the changes are all there in order to pass the value from
indexTask to JsonReader and IndexMergerV6 or updating tests to include the new
value.
I also changed LoadDataView.tsx so that the web-console always used this by
default by setting disableNullColumnSkipping to true on the IoConfig;
This PR has:
- [ x] been self-reviewed.
- [ x] added unit tests or modified existing tests to cover new code paths.
- [ x] been tested in a test Druid cluster.
##### Key changed/added classes in this PR
* `IndexTask`
* `JsonReader`
* `stringDimensionMergerV6`
* `IndexMergerV6`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]