kfaraz commented on code in PR #16171:
URL: https://github.com/apache/druid/pull/16171#discussion_r1543990185
##########
indexing-service/src/main/java/org/apache/druid/indexing/input/DruidInputSource.java:
##########
@@ -620,4 +634,23 @@ public static List<TimelineObjectHolder<String,
DataSegment>> getTimelineForSegm
return new ArrayList<>(timeline.values());
}
+
+ /**
+ * Should ideally be called after creating the reader with {@link
#reader(InputRowSchema, InputFormat, File)}
+ * call to not be a costly operation.
+ *
+ * @return count of segments handled by this input source
+ */
+ public int segmentsCount()
Review Comment:
If the segment count has not been computed yet, this method should just
return null instead of having a fallback.
Ideally, the fallback should never be invoked (we can verify this in a test
cluster).
But if we keep a fallback, it would be difficult to know what happened
inside this method.
##########
indexing-service/src/main/java/org/apache/druid/indexing/input/DruidInputSource.java:
##########
@@ -155,6 +157,8 @@ public class DruidInputSource extends AbstractInputSource
implements SplittableI
@Nullable
private final TaskToolbox toolbox;
+ @Nullable
+ private Integer segmentsCount;
Review Comment:
```suggestion
private Integer numSegmentsInTimeline;
```
##########
indexing-service/src/main/java/org/apache/druid/indexing/input/DruidInputSource.java:
##########
@@ -620,4 +634,23 @@ public static List<TimelineObjectHolder<String,
DataSegment>> getTimelineForSegm
return new ArrayList<>(timeline.values());
}
+
+ /**
+ * Should ideally be called after creating the reader with {@link
#reader(InputRowSchema, InputFormat, File)}
+ * call to not be a costly operation.
+ *
+ * @return count of segments handled by this input source
+ */
+ public int segmentsCount()
Review Comment:
```suggestion
public Integer getNumberOfSegmentsRead()
```
##########
indexing-service/src/main/java/org/apache/druid/indexing/input/DruidInputSource.java:
##########
@@ -620,4 +634,23 @@ public static List<TimelineObjectHolder<String,
DataSegment>> getTimelineForSegm
return new ArrayList<>(timeline.values());
}
+
+ /**
+ * Should ideally be called after creating the reader with {@link
#reader(InputRowSchema, InputFormat, File)}
+ * call to not be a costly operation.
+ *
+ * @return count of segments handled by this input source
+ */
+ public int segmentsCount()
+ {
+ if (segmentIds != null) {
+ return segmentIds.size();
+ }
+
+ if (segmentsCount == null) {
+ createTimeline();
+ }
+
+ return segmentsCount;
Review Comment:
I don't think we need to use the `segmentIds` in any case, we can just stick
to using the count from the timeline.
```suggestion
return segmentCount;
```
##########
indexing-service/src/test/java/org/apache/druid/indexing/common/task/IngestionTestBase.java:
##########
@@ -114,6 +119,7 @@ public abstract class IngestionTestBase extends
InitializedNullHandlingTest
private SegmentsMetadataManager segmentsMetadataManager;
private TaskLockbox lockbox;
private File baseDir;
+ protected File reportsFile;
Review Comment:
Ah, never mind, just saw that it is being used in
`AbstractParallelTaskTestAsWell`.
However, could you make this field private and expose a `protected
getReportsFile()` method? We don't want any test to reassign this variable
(unless the test needs to write out multiple different reports, but that
doesn't seem to be the case right now).
##########
indexing-service/src/main/java/org/apache/druid/indexing/input/DruidInputSource.java:
##########
@@ -620,4 +634,23 @@ public static List<TimelineObjectHolder<String,
DataSegment>> getTimelineForSegm
return new ArrayList<>(timeline.values());
}
+
+ /**
+ * Should ideally be called after creating the reader with {@link
#reader(InputRowSchema, InputFormat, File)}
+ * call to not be a costly operation.
+ *
+ * @return count of segments handled by this input source
Review Comment:
```suggestion
* @return Number of segments read by this input source. This value is
null until the method {@link #fixedFormatReader} has been invoked on this input
source.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]