jihoonson commented on issue #6989: Behavior of index_parallel with appendToExisting=false and no bucketIntervals in GranularitySpec is surprising URL: https://github.com/apache/incubator-druid/issues/6989#issuecomment-460870802 @glasser thank you for finding this! I agree with you that the behavior of indexParallelTask is supposed to be same with (or at least similar to) indexTask or hadoopIndexTask. So, I think this is a bug. indexParallelTask is expected to overwrite existing segments unless `appendToExisting` is explicitly set to true. I think it's still possible to avoid another scan even if `intervals` are not given. That is, we can find intervals and generate segments at the same time. The algorithm would be: 1. Finds a bucketed interval from an input row. This can be done by `interval = granularitySpec.getSegmentGranularity().bucket(inputRow.getTimestamp());` 2. Checks the task has a valid lock for that interval. If it doesn't have a lock yet, it should requests a lock. If it fails to get a lock or the lock has already revoked, the task fails. 3. Create a segmentId with the version of the lock. So, this would be mostly about allocating segmentIds and getting task locks. I think it would be better to modify `ParallelIndexSupervisorTask.allocateNewSegment()` rather than modifying `SegmentAllocateAction` because `SegmentAllocateAction` is designed for appending and already complex enough. In summary, we may want to change [this block](https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSubTask.java#L240-L262) to call `taskClient.allocateSegment()` if `explicitIntervals` = false. Also [ParallelIndexSupervisorTask.allocateNewSegment()](https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java#L359-L391) needs to be modified to implement the above algorithm. What do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
