jihoonson commented on issue #8061: Native parallel batch indexing with shuffle
URL: 
https://github.com/apache/incubator-druid/issues/8061#issuecomment-547210004
 
 
   > I did not get big the performance gain than expected. for the small 
tenant, the query latency only reduces 50ms-100ms but for the big tenant, the 
latency increases 10s-30s. I think it caused by the data skewness with the 
hashed partition of tenant_id. The biggest tenant in an 18GB segment.
   
   Thank you for sharing! The performance gain does look small but still 
interesting.
   
   > Do you mean that druid 0.17.0 will support single-dimension range 
partitioning in native parallel indexing tasks?
   
   I hope so. You can check the proposal in 
https://github.com/apache/incubator-druid/issues/8769.
   
   > Also, if there is a big tenant in a range set of tenant_id, how to avoid 
segment size skewness by single-dimension range partitioning in the future 
native parallel indexing task?
   
   Hmm, sorry I didn't mean single-dimension range partitioning helps with the 
skewed partitioning. I'm not sure if there's a better way except adding other 
columns to the partition key.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to