kfaraz commented on code in PR #16171: URL: https://github.com/apache/druid/pull/16171#discussion_r1543990185
########## indexing-service/src/main/java/org/apache/druid/indexing/input/DruidInputSource.java: ########## @@ -620,4 +634,23 @@ public static List<TimelineObjectHolder<String, DataSegment>> getTimelineForSegm return new ArrayList<>(timeline.values()); } + + /** + * Should ideally be called after creating the reader with {@link #reader(InputRowSchema, InputFormat, File)} + * call to not be a costly operation. + * + * @return count of segments handled by this input source + */ + public int segmentsCount() Review Comment: If the segment count has not been computed yet, this method should just return null instead of having a fallback. Ideally, the fallback should never be invoked (we can verify this in a test cluster). But if we keep a fallback, it would be difficult to know what happened inside this method. ########## indexing-service/src/main/java/org/apache/druid/indexing/input/DruidInputSource.java: ########## @@ -155,6 +157,8 @@ public class DruidInputSource extends AbstractInputSource implements SplittableI @Nullable private final TaskToolbox toolbox; + @Nullable + private Integer segmentsCount; Review Comment: ```suggestion private Integer numSegmentsInTimeline; ``` ########## indexing-service/src/main/java/org/apache/druid/indexing/input/DruidInputSource.java: ########## @@ -620,4 +634,23 @@ public static List<TimelineObjectHolder<String, DataSegment>> getTimelineForSegm return new ArrayList<>(timeline.values()); } + + /** + * Should ideally be called after creating the reader with {@link #reader(InputRowSchema, InputFormat, File)} + * call to not be a costly operation. + * + * @return count of segments handled by this input source + */ + public int segmentsCount() Review Comment: ```suggestion public Integer getNumberOfSegmentsRead() ``` ########## indexing-service/src/main/java/org/apache/druid/indexing/input/DruidInputSource.java: ########## @@ -620,4 +634,23 @@ public static List<TimelineObjectHolder<String, DataSegment>> getTimelineForSegm return new ArrayList<>(timeline.values()); } + + /** + * Should ideally be called after creating the reader with {@link #reader(InputRowSchema, InputFormat, File)} + * call to not be a costly operation. + * + * @return count of segments handled by this input source + */ + public int segmentsCount() + { + if (segmentIds != null) { + return segmentIds.size(); + } + + if (segmentsCount == null) { + createTimeline(); + } + + return segmentsCount; Review Comment: I don't think we need to use the `segmentIds` in any case, we can just stick to using the count from the timeline. ```suggestion return segmentCount; ``` ########## indexing-service/src/test/java/org/apache/druid/indexing/common/task/IngestionTestBase.java: ########## @@ -114,6 +119,7 @@ public abstract class IngestionTestBase extends InitializedNullHandlingTest private SegmentsMetadataManager segmentsMetadataManager; private TaskLockbox lockbox; private File baseDir; + protected File reportsFile; Review Comment: Ah, never mind, just saw that it is being used in `AbstractParallelTaskTestAsWell`. However, could you make this field private and expose a `protected getReportsFile()` method? We don't want any test to reassign this variable (unless the test needs to write out multiple different reports, but that doesn't seem to be the case right now). ########## indexing-service/src/main/java/org/apache/druid/indexing/input/DruidInputSource.java: ########## @@ -620,4 +634,23 @@ public static List<TimelineObjectHolder<String, DataSegment>> getTimelineForSegm return new ArrayList<>(timeline.values()); } + + /** + * Should ideally be called after creating the reader with {@link #reader(InputRowSchema, InputFormat, File)} + * call to not be a costly operation. + * + * @return count of segments handled by this input source Review Comment: ```suggestion * @return Number of segments read by this input source. This value is null until the method {@link #fixedFormatReader} has been invoked on this input source. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org