glasser edited a comment on issue #7048: Make IngestSegmentFirehoseFactory splittable for parallel ingestion URL: https://github.com/apache/incubator-druid/pull/7048#issuecomment-462191642 Note: there's a comment in the PR about how `Granularity.getIterable` can return intervals which extend beyond its input interval. Here are the uses of that function in non-test code in Druid: - druid-indexing-hadoop, GranularityPathSpec: trims the output of that iterable - druid-processing, IntervalChunkingQueryRunner: does some processing which I believe is equivalent to trimming the output of that iterable - druid-processing, GroupByUniversalTimestamp: probably should be using granularity.bucketStart instead - druid-processing, QueryableIndexStorageAdapter.CursorSequenceBuilder: appears to trim the output of that iterable (inside the `apply` method) - druid-processing, IncrementalIndexStorageAdapter: most uses of the interval (in initializing timeStart and cursorIterable) trim it to the actualInterval, but the `time` field doesn't. Is that a bug? - druid-server, UniformGranularitySpec: does not trim the intervals. Because batch ingestion uses this field to throw away data not in the given explicit intervals, this means that the interval used to throw away data may be less precisely (larger) than the specified one, which might be surprising! On the other hand, these intervals are what are used (via bucketInterval) to allocate segments, so they probably ought to be segmentGranularity-aligned (in the common case where you are using an ArbitraryGranularitySpec nested inside a UniformGranularitySpec instead of just a standalone ArbitraryGranularitySpec, which is what this getIterable() call is about). This suggests that it would be helpful to have a `Granularity.getTrimmedIterable()` method which would allow most of the current callers of `Granularity.getIterable()` to be simplified. There are two calls in my list that wouldn't be able to use getTrimmedIterable, and I don't know Druid well enough to say whether that is due to bugs or is intentional. If we do conclude that these are both bugs, I don't know the Druid compatibility rules well enough to know if it would be OK to just change the semantics of `Granularity.getIterable()` rather than add a second variant.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
