To support range partitioning for native parallel batch indexing, I’m considering moving DataSketches from extensions to core (see https://github.com/apache/incubator-druid/issues/8769 <https://github.com/apache/incubator-druid/issues/8769> for details). Having DataSketches in core would also allow us to switch usages of HyperLogLogCollector to the better HLL implementation available in DataSketches. One drawback is that moving DataSketches to core will possibly block the work to upgrade DataSketches to the latest version: https://github.com/apache/incubator-druid/pull/8647 <https://github.com/apache/incubator-druid/pull/8647>.
Any other thoughts on the pros/cons? Thanks, Chi