github-actions[bot] commented on code in PR #63648:
URL: https://github.com/apache/doris/pull/63648#discussion_r3322577500


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -213,6 +213,7 @@ public IcebergScanNode(PlanNodeId id, TupleDescriptor desc, 
boolean needCheckCol
 
     @Override
     protected void doInitialize() throws UserException {
+        long startTime = System.currentTimeMillis();

Review Comment:
   This timer starts too late to include the cold Iceberg table load. Both 
`IcebergApiSource` and `IcebergHMSSource` call 
`IcebergUtils.getIcebergTable(...)` in their constructors, which are invoked 
from this scan node constructor before `doInitialize()` reaches this 
`startTime`. On a cold metadata cache, 
`IcebergExternalMetaCache.getIcebergTable()`/`loadTableCacheValue()` can be the 
expensive table-metadata access, but `External Table Get Table Meta Time` will 
only include the later cached `source.getIcebergTable()` and property setup. 
Please move the timing to cover source construction/table loading, or defer the 
`getIcebergTable()` call into this profiled section.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -914,21 +934,30 @@ private List<Split> doGetSplits(int numBackends) throws 
UserException {
         TableScan scan = createTableScan();
 
         try (CloseableIterable<FileScanTask> fileScanTasks = 
planFileScanTask(scan)) {
-            if (tableLevelPushDownCount) {
-                int needSplitCnt = countFromSnapshot < 
COUNT_WITH_PARALLEL_SPLITS
-                        ? 1 : 
sessionVariable.getParallelExecInstanceNum(scanContext.getClusterName()) * 
numBackends;
-                for (FileScanTask next : fileScanTasks) {

Review Comment:
   This only times the normal Iceberg split path through `doGetSplits()`. 
`getSplits()` sends `isSystemTable` queries to `doGetSystemTableSplits()`, and 
that method still calls `scan.planFiles()` directly without 
`planFileScanTask()` or any `addExternalTableGetFileScanTasksTime(...)` 
accounting. Iceberg metadata/system-table queries can therefore spend time 
planning file scan tasks while the new `External Table Get File Scan Tasks 
Time` remains `N/A`/zero. Please add equivalent timing around the 
`scan.planFiles()` iteration in the system-table path as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to