[
https://issues.apache.org/jira/browse/IMPALA-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang reassigned IMPALA-13122:
---------------------------------------
Assignee: Quanlong Huang (was: Noémi Pap-Takács)
> Show file stats in table loading logs
> -------------------------------------
>
> Key: IMPALA-13122
> URL: https://issues.apache.org/jira/browse/IMPALA-13122
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
> Labels: ramp-up
>
> Here is an example for table loading logs on a table:
> {noformat}
> I0603 08:46:05.555567 24417 HdfsTable.java:1255] Loading metadata for table
> definition and all partition(s) of tpcds.store_sales (needed by coordinator)
> I0603 08:46:05.642702 24417 HdfsTable.java:1896] Loaded 23 columns from HMS.
> Actual columns: 23
> I0603 08:46:05.767457 24417 HdfsTable.java:3114] Load Valid Write Id List
> Done. Time taken: 26.699us
> I0603 08:46:05.767549 24417 HdfsTable.java:1297] Fetching partition metadata
> from the Metastore: tpcds.store_sales
> I0603 08:46:05.806337 24417 MetaStoreUtil.java:190] Fetching 1824 partitions
> for: tpcds.store_sales using partition batch size: 1000
> I0603 08:46:07.336064 24417 MetaStoreUtil.java:208] Fetched 1000/1824
> partitions for table tpcds.store_sales
> I0603 08:46:07.915474 24417 MetaStoreUtil.java:208] Fetched 1824/1824
> partitions for table tpcds.store_sales
> I0603 08:46:07.915519 24417 HdfsTable.java:1304] Fetched partition metadata
> from the Metastore: tpcds.store_sales
> I0603 08:46:08.840034 24417 ParallelFileMetadataLoader.java:224] Loading file
> and block metadata for 1824 paths for table tpcds.store_sales using a thread
> pool of size 5
> I0603 08:46:09.383904 24417 HdfsTable.java:836] Loaded file and block
> metadata for tpcds.store_sales partitions: ss_sold_date_sk=2450816,
> ss_sold_date_sk=2450817, ss_sold_date_sk=2450818, and 1821 others. Time
> taken: 569.107ms
> I0603 08:46:09.420702 24417 Table.java:1117] last refreshed event id for
> table: tpcds.store_sales set to: -1
> I0603 08:46:09.420794 24417 TableLoader.java:177] Loaded metadata for:
> tpcds.store_sales (4026ms){noformat}
> From the logs, we know the table has 23 columns and 1824 partitions. Time
> spent in loading the table schema and file metadata are also shown.
> However, it's unknown whether there are small files issue under the
> partitions. The underlying storage could also be slow (e.g. S3) which results
> in a long time in loading file metadata.
> It'd be helpful to add these in the logs:
> * number of files loaded
> * min/avg/max of file sizes
> * total file size
> * number of files
> * number of blocks (HDFS only)
> * number of hosts, disks (HDFS/Ozone only)
> * Stats of accessTime and lastModifiedTime
> These can be aggregated in FileMetadataLoader#loadInternal() and logged in
> ParallelFileMetadataLoader#load() or
> HdfsTable#loadFileMetadataForPartitions().
> [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L177]
> [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L172]
> [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L836]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]