rdblue commented on code in PR #5268:
URL: https://github.com/apache/iceberg/pull/5268#discussion_r922363887
##########
core/src/main/java/org/apache/iceberg/BaseTableScan.java:
##########
@@ -116,9 +133,20 @@ public CloseableIterable<FileScanTask> planFiles() {
ExpressionUtil.toSanitizedString(filter()));
Listeners.notifyAll(new ScanEvent(table().name(), snapshot.snapshotId(),
filter(), schema()));
-
- return doPlanFiles();
-
+ scanReporter().startScan();
+ Timer.Sample sample =
scanReporter().scanMetrics().totalScanDuration().start();
Review Comment:
Tying the lifecycle of scan metrics to the scan reporter means that all of
the scans created through this chain will have the same metrics collector,
which is a correctness problem.
In the Java library, the `TableScan` interface uses a refinement pattern.
Each `TableScan` is independent and you get a new table scan that has
additional options when you call the refinement methods, like `select` or
`filter`. For example,
```java
TableScan fullTableScan = table.newScan();
TableScan yesterday = fullTableScan.filter(Expressions.equal("date_col",
yesterday()));
TableScan loginsYesterday = yesterday.filter(Expressions.equal("event_type",
"login"));
CloseableIterable<FileScanTask> yesterdayTasks = yesterday.planFiles();
CloseableIterable<FileScanTask> loginTasks = loginsYesterday.planFiles();
```
Two scans are planned, but the metrics for both would use the same
`ScanMetrics`.
Above, I suggested separating `ScanMetrics` from the reporter, which would
solve this problem. Not all scans are planned, so you wouldn't want to eagerly
create metrics, but the ones that are used should have a new one:
```java
protected ScanMetrics metrics() {
if (scanMetrics == null) {
this.scanMetrics = scanReporter.newScanMetrics();
}
return scanMetrics;
}
public CloseableIterable<FileScanTask> planFiles(...) {
ScanMetrics metrics = metrics();
...
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]