jihoonson commented on issue #8061: Native parallel batch indexing with shuffle URL: https://github.com/apache/incubator-druid/issues/8061#issuecomment-547210004 > I did not get big the performance gain than expected. for the small tenant, the query latency only reduces 50ms-100ms but for the big tenant, the latency increases 10s-30s. I think it caused by the data skewness with the hashed partition of tenant_id. The biggest tenant in an 18GB segment. Thank you for sharing! The performance gain does look small but still interesting. > Do you mean that druid 0.17.0 will support single-dimension range partitioning in native parallel indexing tasks? I hope so. You can check the proposal in https://github.com/apache/incubator-druid/issues/8769. > Also, if there is a big tenant in a range set of tenant_id, how to avoid segment size skewness by single-dimension range partitioning in the future native parallel indexing task? Hmm, sorry I didn't mean single-dimension range partitioning helps with the skewed partitioning. I'm not sure if there's a better way except adding other columns to the partition key.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
