[GitHub] glasser edited a comment on issue #7048: Make IngestSegmentFirehoseFactory splittable for parallel ingestion

GitBox Sun, 10 Feb 2019 15:37:35 -0800

glasser edited a comment on issue #7048: Make IngestSegmentFirehoseFactory 
splittable for parallel ingestion
URL: https://github.com/apache/incubator-druid/pull/7048#issuecomment-462191642
 
 
   Note: there's a comment in the PR about how `Granularity.getIterable` can 
return intervals which extend beyond its input interval.  Here are the uses of 
that function in non-test code in Druid:
   
   - druid-indexing-hadoop, GranularityPathSpec: trims the output of that 
iterable
   - druid-processing, IntervalChunkingQueryRunner: does some processing which 
I believe is equivalent to trimming the output of that iterable
   - druid-processing, GroupByUniversalTimestamp: probably should be using 
granularity.bucketStart instead
   - druid-processing, QueryableIndexStorageAdapter.CursorSequenceBuilder: 
appears to trim the output of that iterable (inside the `apply` method)
   - druid-processing, IncrementalIndexStorageAdapter: most uses of the 
interval (in initializing timeStart and cursorIterable) trim it to the 
actualInterval, but the `time` field doesn't. Is that a bug?
   - druid-server, UniformGranularitySpec: does not trim the intervals.  
Because batch ingestion uses this field to throw away data not in the given 
explicit intervals, this means that the interval used to throw away data may be 
less precisely (larger) than the specified one, which might be surprising! On 
the other hand, these intervals are what are used (via bucketInterval) to 
allocate segments, so they probably ought to be segmentGranularity-aligned (in 
the common case where you are using an ArbitraryGranularitySpec nested inside a 
UniformGranularitySpec instead of just a standalone ArbitraryGranularitySpec, 
which is what this getIterable() call is about).
   
   This suggests that it would be helpful to have a 
`Granularity.getTrimmedIterable()` method which would allow most of the current 
callers of `Granularity.getIterable()` to be simplified. There are two calls in 
my list that wouldn't be able to use getTrimmedIterable, and I don't know Druid 
well enough to say whether that is due to bugs or is intentional.  If we do 
conclude that these are both bugs, I don't know the Druid compatibility rules 
well enough to know if it would be OK to just change the semantics of 
`Granularity.getIterable()` rather than add a second variant.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] glasser edited a comment on issue #7048: Make IngestSegmentFirehoseFactory splittable for parallel ingestion

Reply via email to