gianm opened a new issue, #14520: URL: https://github.com/apache/druid/issues/14520
After upgrading to datasketches 4.0.0 (#14334) we are seeing an NPE during ingestion. Stack trace below. It looks like `union.update(sketch)` in DataSketches 3.x accepts null `sketch`, but `union.union(sketch)` in DataSketches 4.x doesn't. Here's the code from DataSketches 3.3.0: https://github.com/apache/datasketches-java/blob/3.3.0/src/main/java/org/apache/datasketches/quantiles/DoublesUnionImpl.java#L116-L119 I think we can fix this by skipping the call to `union.union(sketch)` if `sketch` is null. We also need to: - audit other calls to `Union#union` for possible null inputs and fix those too. - audit other sketches for similar situations (where nulls used to be accepted and are no longer) and fix those up if there are any. ``` 2023-07-03T13:59:20,448 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Exception while running task[<task>] java.lang.NullPointerException: null at java.util.Objects.requireNonNull(Objects.java:209) ~[?:?] at org.apache.datasketches.quantiles.DoublesUnionImpl.union(DoublesUnionImpl.java:124) ~[datasketches-java-4.0.0.jar:?] at org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchAggregatorFactory$2.fold(DoublesSketchAggregatorFactory.java:266) ~[?:?] at org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchAggregatorFactory$2.reset(DoublesSketchAggregatorFactory.java:259) ~[?:?] at org.apache.druid.segment.RowCombiningTimeAndDimsIterator.resetCombinedMetrics(RowCombiningTimeAndDimsIterator.java:249) ~[druid-processing.jar] at org.apache.druid.segment.RowCombiningTimeAndDimsIterator.combineToCurrentTimeAndDims(RowCombiningTimeAndDimsIterator.java:229) ~[druid-processing.jar] at org.apache.druid.segment.RowCombiningTimeAndDimsIterator.moveToNext(RowCombiningTimeAndDimsIterator.java:191) ~[druid-processing.jar] at org.apache.druid.segment.IndexMergerV9.mergeIndexesAndWriteColumns(IndexMergerV9.java:605) ~[druid-processing.jar] at org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:233) ~[druid-processing.jar] at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:1155) ~[druid-processing.jar] at org.apache.druid.segment.IndexMergerV9.multiphaseMerge(IndexMergerV9.java:972) ~[druid-processing.jar] at org.apache.druid.segment.IndexMergerV9.mergeQueryableIndex(IndexMergerV9.java:914) ~[druid-processing.jar] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.mergeSegmentsInSamePartition(PartialSegmentMergeTask.java:352) ~[druid-indexing-service.jar] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.mergeAndPushSegments(PartialSegmentMergeTask.java:260) ~[druid-indexing-service.jar] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.runTask(PartialSegmentMergeTask.java:191) ~[druid-indexing-service.jar] at org.apache.druid.indexing.common.task.batch.parallel.PartialGenericSegmentMergeTask.runTask(PartialGenericSegmentMergeTask.java:46) ~[druid-indexing-service.jar] at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:173) ~[druid-indexing-service.jar] at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) ~[druid-indexing-service.jar] at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) ~[druid-indexing-service.jar] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
