github-actions[bot] commented on code in PR #63648:
URL: https://github.com/apache/doris/pull/63648#discussion_r3322577500
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -213,6 +213,7 @@ public IcebergScanNode(PlanNodeId id, TupleDescriptor desc,
boolean needCheckCol
@Override
protected void doInitialize() throws UserException {
+ long startTime = System.currentTimeMillis();
Review Comment:
This timer starts too late to include the cold Iceberg table load. Both
`IcebergApiSource` and `IcebergHMSSource` call
`IcebergUtils.getIcebergTable(...)` in their constructors, which are invoked
from this scan node constructor before `doInitialize()` reaches this
`startTime`. On a cold metadata cache,
`IcebergExternalMetaCache.getIcebergTable()`/`loadTableCacheValue()` can be the
expensive table-metadata access, but `External Table Get Table Meta Time` will
only include the later cached `source.getIcebergTable()` and property setup.
Please move the timing to cover source construction/table loading, or defer the
`getIcebergTable()` call into this profiled section.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -914,21 +934,30 @@ private List<Split> doGetSplits(int numBackends) throws
UserException {
TableScan scan = createTableScan();
try (CloseableIterable<FileScanTask> fileScanTasks =
planFileScanTask(scan)) {
- if (tableLevelPushDownCount) {
- int needSplitCnt = countFromSnapshot <
COUNT_WITH_PARALLEL_SPLITS
- ? 1 :
sessionVariable.getParallelExecInstanceNum(scanContext.getClusterName()) *
numBackends;
- for (FileScanTask next : fileScanTasks) {
Review Comment:
This only times the normal Iceberg split path through `doGetSplits()`.
`getSplits()` sends `isSystemTable` queries to `doGetSystemTableSplits()`, and
that method still calls `scan.planFiles()` directly without
`planFileScanTask()` or any `addExternalTableGetFileScanTasksTime(...)`
accounting. Iceberg metadata/system-table queries can therefore spend time
planning file scan tasks while the new `External Table Get File Scan Tasks
Time` remains `N/A`/zero. Please add equivalent timing around the
`scan.planFiles()` iteration in the system-table path as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]