[
https://issues.apache.org/jira/browse/IMPALA-13996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947606#comment-17947606
]
Quanlong Huang commented on IMPALA-13996:
-----------------------------------------
Double checkd when built with TARGET_FILESYSTEM=hdfs and ERASURE_CODING=true,
the input data file /test-warehouse/tpch.lineitem/lineitem.tbl has two block
groups:
{noformat}
$ hadoop fsck /test-warehouse/tpch.lineitem/lineitem.tbl
Erasure Coded Block Groups:
Total size: 753862072 B
Total files: 1
Total block groups (validated): 2 (avg. block group size 376931036 B)
Minimally erasure-coded block groups: 2 (100.0 %)
Over-erasure-coded block groups: 0 (0.0 %)
Under-erasure-coded block groups: 0 (0.0 %)
Unsatisfactory placement block groups: 0 (0.0 %)
Average block group size: 5.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0 (0.0 %)
Blocks queued for replication: 0
FSCK ended at Sun Apr 27 02:01:07 PDT 2025 in 1 milliseconds{noformat}
So when reading from tpch.lineitem, there are only two fragment instances, thus
tpch_parquet.lineitem has only two files in erasure coding builds.
We shouldn't rely on the number of files of tpch_parquet.lineitem in this test.
> TestAllowIncompleteData.test_too_many_files fails erasure coding builds
> -----------------------------------------------------------------------
>
> Key: IMPALA-13996
> URL: https://issues.apache.org/jira/browse/IMPALA-13996
> Project: IMPALA
> Issue Type: Bug
> Reporter: Surya Hebbar
> Assignee: Quanlong Huang
> Priority: Major
>
> TestAllowIncompleteData.test_too_many_files fails erasure coding builds -
> Error -
> {code}
> assert "Too many files to collect in table tpch_parquet.lineitem: 3. Current
> limit is 1 configured by startup flag 'catalog_partial_fetch_max_files'.
> Consider compacting files of the table." in "Query
> f74919e60b835567:da9967a400000000 failed:\nLocalCatalogException: Could not
> load partitions for table tpch_parq...t limit is 1 configured by startup flag
> 'catalog_partial_fetch_max_files'. Consider compacting files of the
> table.\n\n" + where "Query f74919e60b835567:da9967a400000000
> failed:\nLocalCatalogException: Could not load partitions for table
> tpch_parq...t limit is 1 configured by startup flag
> 'catalog_partial_fetch_max_files'. Consider compacting files of the
> table.\n\n" = str(ImpalaBeeswaxException()){code}
>
> Stacktrace -
> {code}
> custom_cluster/test_local_catalog.py:721: in test_too_many_files
> assert err in str(exception)
> E assert "Too many files to collect in table tpch_parquet.lineitem: 3.
> Current limit is 1 configured by startup flag
> 'catalog_partial_fetch_max_files'. Consider compacting files of the table."
> in "Query f74919e60b835567:da9967a400000000 failed:\nLocalCatalogException:
> Could not load partitions for table tpch_parq...t limit is 1 configured by
> startup flag 'catalog_partial_fetch_max_files'. Consider compacting files of
> the table.\n\n"
> E + where "Query f74919e60b835567:da9967a400000000
> failed:\nLocalCatalogException: Could not load partitions for table
> tpch_parq...t limit is 1 configured by startup flag
> 'catalog_partial_fetch_max_files'. Consider compacting files of the
> table.\n\n" = str(ImpalaBeeswaxException())
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]