gianm commented on PR #18944:
URL: https://github.com/apache/druid/pull/18944#issuecomment-3789629216
> @gianm another thing I discovered investigating this patch is that Hadoop
by default does not create all-null columns in a
segment(`-Ddruid.indexer.task.storeEmptyColumns=false` by default). Native
batch in latest version does... . This is the key difference that showed up in
the segment diff. #12279. Do you know why this is?
There's a comment in `IndexMergerV9` that says Hadoop indexing uses a
constructor that doesn't support the `storeEmptyColumns` configuration yet. In
that spot it's hard-coded to `false`. I suppose it would make more sense for
that to be hard-coded to `TaskConfig.DEFAULT_STORE_EMPTY_COLUMNS`, i.e.,
`true`. It would make even more sense for it to respect the actual
configuration, which would mean switching to some logic like the
`TaskToolboxFactory` uses:
```java
config.buildV10()
? indexMergerV10Factory.create()
: indexMergerV9Factory.create(
task.getContextValue(Tasks.STORE_EMPTY_COLUMNS_KEY,
config.isStoreEmptyColumns())
)
```
Rather than injecting the `IndexMergerV9` directly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]