Re: [PR] Fix Hadoop multi-value string null value handling to match native batch (druid)

via GitHub Fri, 23 Jan 2026 02:48:28 -0800


gianm commented on PR #18944:
URL: https://github.com/apache/druid/pull/18944#issuecomment-3789629216


   > @gianm another thing I discovered investigating this patch is that Hadoop 
by default does not create all-null columns in a 
segment(`-Ddruid.indexer.task.storeEmptyColumns=false` by default). Native 
batch in latest version does... . This is the key difference that showed up in 
the segment diff. #12279. Do you know why this is?
   
   There's a comment in `IndexMergerV9` that says Hadoop indexing uses a 
constructor that doesn't support the `storeEmptyColumns` configuration yet. In 
that spot it's hard-coded to `false`. I suppose it would make more sense for 
that to be hard-coded to `TaskConfig.DEFAULT_STORE_EMPTY_COLUMNS`, i.e., 
`true`. It would make even more sense for it to respect the actual 
configuration, which would mean switching to some logic like the 
`TaskToolboxFactory` uses:
   
   ```java
   config.buildV10()
               ? indexMergerV10Factory.create()
               : indexMergerV9Factory.create(
                   task.getContextValue(Tasks.STORE_EMPTY_COLUMNS_KEY, 
config.isStoreEmptyColumns())
               )
   ```
   
   Rather than injecting the `IndexMergerV9` directly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix Hadoop multi-value string null value handling to match native batch (druid)

Reply via email to