jihoonson commented on issue #6989: Behavior of index_parallel with 
appendToExisting=false and no bucketIntervals in GranularitySpec is surprising
URL: 
https://github.com/apache/incubator-druid/issues/6989#issuecomment-460870802
 
 
   @glasser thank you for finding this! I agree with you that the behavior of 
indexParallelTask is supposed to be same with (or at least similar to) 
indexTask or hadoopIndexTask. So, I think this is a bug. indexParallelTask is 
expected to overwrite existing segments unless `appendToExisting` is explicitly 
set to true. 
   
   I think it's still possible to avoid another scan even if `intervals` are 
not given. That is, we can find intervals and generate segments at the same 
time. The algorithm would be:
   
   1. Finds a bucketed interval from an input row. This can be done by 
`interval = 
granularitySpec.getSegmentGranularity().bucket(inputRow.getTimestamp());`
   2. Checks the task has a valid lock for that interval. If it doesn't have a 
lock yet, it should requests a lock. If it fails to get a lock or the lock has 
already revoked, the task fails.
   3. Create a segmentId with the version of the lock.
   
   So, this would be mostly about allocating segmentIds and getting task locks. 
I think it would be better to modify 
`ParallelIndexSupervisorTask.allocateNewSegment()` rather than modifying 
`SegmentAllocateAction` because `SegmentAllocateAction` is designed for 
appending and already complex enough. 
   
   In summary, we may want to change [this 
block](https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSubTask.java#L240-L262)
 to call `taskClient.allocateSegment()` if `explicitIntervals` = false. Also 
[ParallelIndexSupervisorTask.allocateNewSegment()](https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java#L359-L391)
 needs to be modified to implement the above algorithm.
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to