[GitHub] jihoonson commented on issue #6989: Behavior of index_parallel with appendToExisting=false and no bucketIntervals in GranularitySpec is surprising

GitBox Tue, 05 Feb 2019 17:32:50 -0800

jihoonson commented on issue #6989: Behavior of index_parallel with
appendToExisting=false and no bucketIntervals in GranularitySpec is surprising
URL:
https://github.com/apache/incubator-druid/issues/6989#issuecomment-460870802

@glasser thank you for finding this! I agree with you that the behavior of
indexParallelTask is supposed to be same with (or at least similar to)
indexTask or hadoopIndexTask. So, I think this is a bug. indexParallelTask is
expected to overwrite existing segments unless `appendToExisting` is explicitly
set to true.

I think it's still possible to avoid another scan even if `intervals` are
not given. That is, we can find intervals and generate segments at the same
time. The algorithm would be:

1. Finds a bucketed interval from an input row. This can be done by
`interval =
granularitySpec.getSegmentGranularity().bucket(inputRow.getTimestamp());`
2. Checks the task has a valid lock for that interval. If it doesn't have a
lock yet, it should requests a lock. If it fails to get a lock or the lock has
already revoked, the task fails.
3. Create a segmentId with the version of the lock.

So, this would be mostly about allocating segmentIds and getting task locks.
I think it would be better to modify
`ParallelIndexSupervisorTask.allocateNewSegment()` rather than modifying
`SegmentAllocateAction` because `SegmentAllocateAction` is designed for
appending and already complex enough.

In summary, we may want to change [this
block](https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSubTask.java#L240-L262)
to call `taskClient.allocateSegment()` if `explicitIntervals` = false. Also
[ParallelIndexSupervisorTask.allocateNewSegment()](https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java#L359-L391)
needs to be modified to implement the above algorithm.

What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] jihoonson commented on issue #6989: Behavior of index_parallel with appendToExisting=false and no bucketIntervals in GranularitySpec is surprising

Reply via email to