[GitHub] [druid] AlexanderMann commented on issue #10206: partial_index_generic_merge tasks fail immediately without logs (index_parallel)

GitBox Tue, 08 Mar 2022 11:25:59 -0800


AlexanderMann commented on issue #10206:
URL: https://github.com/apache/druid/issues/10206#issuecomment-1062123720



   👋 So something I think which is relevant here:
   
   **version impacting:** 0.22.1
   
   I think Druid has a 🐛 in regards to compression `znode ... verifySize`. [By 
default Druid chooses to set compression for it's integration to ZK 
_on_](https://druid.apache.org/docs/latest/configuration/index.html#zookeeper-behavior).
 However, whenever it goes to actually [`verifySize` in 
`CuratorUtils`](https://github.com/apache/druid/blob/master/server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fdruid%2Fcurator%2FCuratorUtils.java#L122)
 it _always_ uses the uncompressed size of the data.
   
   For tasks like [Hash Based Native Batch 
ingestion](https://druid.apache.org/docs/latest/ingestion/native-batch.html#hash-based-partitioning),
 where you can often get something like 10s of thousands of 
`partial_index_generate` tasks being constructed, the resulting 
`partial_index_generic_merge` can end up with a _massive_ `partitionSpec`. The 
evidence we've seen, seems to suggest that even _dramatically_ tweaking the 
segment count or size limits in the `splitHintSpec` _do not_ really impact some 
of the resulting `generic_merge` task specs which get generated.
   
   We have had a recurring issue, where our task submission fails something 
like 4 hours into a task, because of [`Length of raw byes for znode 
...`](https://github.com/apache/druid/blob/master/server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fdruid%2Fcurator%2FCuratorUtils.java#L124)
 issues. The spec trying to be submitted is around 5-6 MB (which is nuts in the 
first place) and the compressed...is only about 189K, well below the default 
limits Druid even sets for itself.
   
   Currently we're trying just setting the value of 
`druid.indexer.runner.maxZnodeBytes` to around 6MiB, as we're seeing that 
pretty much _everything_ which is bigger than the default, compresses down 
massively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] AlexanderMann commented on issue #10206: partial_index_generic_merge tasks fail immediately without logs (index_parallel)

Reply via email to