[ 
https://issues.apache.org/jira/browse/IMPALA-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18067184#comment-18067184
 ] 

Michael Smith commented on IMPALA-13122:
----------------------------------------

This test still has problems on Ozone:
{code:java}
custom_cluster/test_file_metadata_stats.py:139: in 
test_file_metadata_stats_host_disk_pairs
    self.assert_catalogd_log_contains("INFO", host_disk_regex, 
expected_count=-1,
        cluster_properties = <tests.common.environ.ImpalaTestClusterProperties 
object at 0x7fb71df9ded0>
        host_disk_regex = 'Host:Disk pairs: \\d+'
        hosts_regex = 'Hosts: \\d+'
        self       = 
<tests.custom_cluster.test_file_metadata_stats.TestFileMetadataStats object at 
0x7fb67476ac10>
        tbl_name   = 'functional.alltypessmall'
common/impala_test_suite.py:1724: in assert_catalogd_log_contains
    return self.assert_log_contains(
        daemon     = 'catalogd'
        dry_run    = False
        expected_count = -1
        level      = 'INFO'
        line_regex = 'Host:Disk pairs: \\d+'
        node_index = 0
        self       = 
<tests.custom_cluster.test_file_metadata_stats.TestFileMetadataStats object at 
0x7fb67476ac10>
        timeout_s  = 15
common/impala_test_suite.py:1802: in assert_log_contains
    assert found > 0, "Expected at least one line in file %s matching regex 
'%s'"\
E   AssertionError: Expected at least one line in file 
/data0/jenkins/workspace/impala-asf-master-core-ozone/repos/Impala/logs/custom_cluster_tests/TestFileMetadataStats/catalogd.impala-ec2-redhat86-m6i-4xlarge-ondemand-0e18.vpc.cloudera.com.jenkins.log.INFO.20260319-113736.3203898
 matching regex 'Host:Disk pairs: \d+', but found none.
        daemon     = 'catalogd'
        dry_run    = False
        expected_count = -1
        found      = 0
        last_re_result = None
        level      = 'INFO'
        line       = 'I20260319 11:37:49.762998 3204546 catalog-server.cc:797] 
A catalog update with 6 entries is assembled. Catalog version: 2120 Last sent 
catalog version: 2119\n'
        line_regex = 'Host:Disk pairs: \\d+'
        log_file   = <_io.BufferedReader 
name='/data0/jenkins/workspace/impala-asf-master-core-ozone/repos/Impala/logs/custom_cluster_tests...tats/catalogd.impala-ec2-redhat86-m6i-4xlarge-ondemand-0e18.vpc.cloudera.com.jenkins.log.INFO.20260319-113736.3203898'>
        log_file_path = 
'/data0/jenkins/workspace/impala-asf-master-core-ozone/repos/Impala/logs/custom_cluster_tests/TestFileMetadataStats/catalogd.impala-ec2-redhat86-m6i-4xlarge-ondemand-0e18.vpc.cloudera.com.jenkins.log.INFO.20260319-113736.3203898'
        pattern    = re.compile('Host:Disk pairs: \\d+')
        re_result  = None
        self       = 
<tests.custom_cluster.test_file_metadata_stats.TestFileMetadataStats object at 
0x7fb67476ac10>
        start_time = 1773945469.7280872
        timeout_s  = 15 {code}

> Show file stats in table loading logs
> -------------------------------------
>
>                 Key: IMPALA-13122
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13122
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Arnab Karmakar
>            Priority: Major
>              Labels: ramp-up
>             Fix For: Impala 5.0.0
>
>
> Here is an example for table loading logs on a table:
> {noformat}
> I0603 08:46:05.555567 24417 HdfsTable.java:1255] Loading metadata for table 
> definition and all partition(s) of tpcds.store_sales (needed by coordinator)
> I0603 08:46:05.642702 24417 HdfsTable.java:1896] Loaded 23 columns from HMS. 
> Actual columns: 23
> I0603 08:46:05.767457 24417 HdfsTable.java:3114] Load Valid Write Id List 
> Done. Time taken: 26.699us
> I0603 08:46:05.767549 24417 HdfsTable.java:1297] Fetching partition metadata 
> from the Metastore: tpcds.store_sales
> I0603 08:46:05.806337 24417 MetaStoreUtil.java:190] Fetching 1824 partitions 
> for: tpcds.store_sales using partition batch size: 1000 
> I0603 08:46:07.336064 24417 MetaStoreUtil.java:208] Fetched 1000/1824 
> partitions for table tpcds.store_sales
> I0603 08:46:07.915474 24417 MetaStoreUtil.java:208] Fetched 1824/1824 
> partitions for table tpcds.store_sales
> I0603 08:46:07.915519 24417 HdfsTable.java:1304] Fetched partition metadata 
> from the Metastore: tpcds.store_sales
> I0603 08:46:08.840034 24417 ParallelFileMetadataLoader.java:224] Loading file 
> and block metadata for 1824 paths for table tpcds.store_sales using a thread 
> pool of size 5
> I0603 08:46:09.383904 24417 HdfsTable.java:836] Loaded file and block 
> metadata for tpcds.store_sales partitions: ss_sold_date_sk=2450816, 
> ss_sold_date_sk=2450817, ss_sold_date_sk=2450818, and 1821 others. Time 
> taken: 569.107ms
> I0603 08:46:09.420702 24417 Table.java:1117] last refreshed event id for 
> table: tpcds.store_sales set to: -1
> I0603 08:46:09.420794 24417 TableLoader.java:177] Loaded metadata for: 
> tpcds.store_sales (4026ms){noformat}
> From the logs, we know the table has 23 columns and 1824 partitions. Time 
> spent in loading the table schema and file metadata are also shown.
> However, it's unknown whether there are small files issue under the 
> partitions. The underlying storage could also be slow (e.g. S3) which results 
> in a long time in loading file metadata.
> It'd be helpful to add these in the logs:
>  * number of files loaded
>  * min/avg/max of file sizes
>  * total file size
>  * number of files
>  * number of blocks (HDFS only)
>  * number of hosts, disks (HDFS/Ozone only)
>  * Stats of accessTime and lastModifiedTime
> These can be aggregated in FileMetadataLoader#loadInternal() and logged in 
> ParallelFileMetadataLoader#load() or 
> HdfsTable#loadFileMetadataForPartitions().
> [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L177]
> [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L172]
> [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L836]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to