[ 
https://issues.apache.org/jira/browse/IMPALA-13996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947328#comment-17947328
 ] 

Quanlong Huang commented on IMPALA-13996:
-----------------------------------------

logs/file-list-begin-1.log and logs/file-list-end-1.log show that there are 
only two data files of tpch_parquet.lineitem in erasure coding builds:
{noformat}
drwxr-xr-x   - jenkins supergroup          0 2025-04-23 21:18 
/test-warehouse/tpch.lineitem_parquet
-rw-r--r--   1 jenkins supergroup  108505625 2025-04-23 21:18 
/test-warehouse/tpch.lineitem_parquet/964c79869e367026-91e7763600000000_2136191419_data.0.parq
-rw-r--r--   1 jenkins supergroup   94429994 2025-04-23 21:18 
/test-warehouse/tpch.lineitem_parquet/964c79869e367026-91e7763600000001_1400772053_data.0.parq
drwxr-xr-x   - jenkins supergroup          0 2025-04-23 21:18 
/test-warehouse/tpch.lineitem_parquet/_impala_insert_staging{noformat}
They are generated by an INSERT query:
{code:sql}
INSERT OVERWRITE TABLE tpch_parquet.lineitem SELECT * FROM tpch.lineitem{code}
Extracted the profile as 964c79869e367026_91e7763600000000_profile.txt. The 
query just runs on two impalads, thus generates two data files only:
{noformat}
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2{noformat}
It seems the block size is larger so the input data file of tpch.lineitem is 
split into only two blocks (splits), thus using two impalads:
{noformat}
    Fragment F00:
      Instance 964c79869e367026:91e7763600000000 
(host=impala-ec2-centos79-m6i-4xlarge-xldisk-1f06.vpc.cloudera.com:27000):
        Last report received time: 2025-04-23 21:18:29.636
        Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/384.00 
MB
...
      Instance 964c79869e367026:91e7763600000001 
(host=impala-ec2-centos79-m6i-4xlarge-xldisk-1f06.vpc.cloudera.com:27001):
        Last report received time: 2025-04-23 21:18:27.781
        Hdfs split stats (<volume id>:<# splits>/<split lengths>): -1:1/334.94 
MB{noformat}

> TestAllowIncompleteData.test_too_many_files fails erasure coding builds
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-13996
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13996
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Surya Hebbar
>            Assignee: Quanlong Huang
>            Priority: Major
>
> TestAllowIncompleteData.test_too_many_files fails erasure coding builds -
> Error -
> {code}
> assert "Too many files to collect in table tpch_parquet.lineitem: 3. Current 
> limit is 1 configured by startup flag 'catalog_partial_fetch_max_files'. 
> Consider compacting files of the table." in "Query 
> f74919e60b835567:da9967a400000000 failed:\nLocalCatalogException: Could not 
> load partitions for table tpch_parq...t limit is 1 configured by startup flag 
> 'catalog_partial_fetch_max_files'. Consider compacting files of the 
> table.\n\n" + where "Query f74919e60b835567:da9967a400000000 
> failed:\nLocalCatalogException: Could not load partitions for table 
> tpch_parq...t limit is 1 configured by startup flag 
> 'catalog_partial_fetch_max_files'. Consider compacting files of the 
> table.\n\n" = str(ImpalaBeeswaxException()){code}
>  
> Stacktrace -
> {code}
> custom_cluster/test_local_catalog.py:721: in test_too_many_files
>     assert err in str(exception)
> E   assert "Too many files to collect in table tpch_parquet.lineitem: 3. 
> Current limit is 1 configured by startup flag 
> 'catalog_partial_fetch_max_files'. Consider compacting files of the table." 
> in "Query f74919e60b835567:da9967a400000000 failed:\nLocalCatalogException: 
> Could not load partitions for table tpch_parq...t limit is 1 configured by 
> startup flag 'catalog_partial_fetch_max_files'. Consider compacting files of 
> the table.\n\n"
> E    +  where "Query f74919e60b835567:da9967a400000000 
> failed:\nLocalCatalogException: Could not load partitions for table 
> tpch_parq...t limit is 1 configured by startup flag 
> 'catalog_partial_fetch_max_files'. Consider compacting files of the 
> table.\n\n" = str(ImpalaBeeswaxException())
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to