AlexanderMann commented on issue #10206: URL: https://github.com/apache/druid/issues/10206#issuecomment-1062123720
👋 So something I think which is relevant here: **version impacting:** 0.22.1 I think Druid has a 🐛 in regards to compression `znode ... verifySize`. [By default Druid chooses to set compression for it's integration to ZK _on_](https://druid.apache.org/docs/latest/configuration/index.html#zookeeper-behavior). However, whenever it goes to actually [`verifySize` in `CuratorUtils`](https://github.com/apache/druid/blob/master/server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fdruid%2Fcurator%2FCuratorUtils.java#L122) it _always_ uses the uncompressed size of the data. For tasks like [Hash Based Native Batch ingestion](https://druid.apache.org/docs/latest/ingestion/native-batch.html#hash-based-partitioning), where you can often get something like 10s of thousands of `partial_index_generate` tasks being constructed, the resulting `partial_index_generic_merge` can end up with a _massive_ `partitionSpec`. The evidence we've seen, seems to suggest that even _dramatically_ tweaking the segment count or size limits in the `splitHintSpec` _do not_ really impact some of the resulting `generic_merge` task specs which get generated. We have had a recurring issue, where our task submission fails something like 4 hours into a task, because of [`Length of raw byes for znode ...`](https://github.com/apache/druid/blob/master/server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fdruid%2Fcurator%2FCuratorUtils.java#L124) issues. The spec trying to be submitted is around 5-6 MB (which is nuts in the first place) and the compressed...is only about 189K, well below the default limits Druid even sets for itself. Currently we're trying just setting the value of `druid.indexer.runner.maxZnodeBytes` to around 6MiB, as we're seeing that pretty much _everything_ which is bigger than the default, compresses down massively. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
