[ 
https://issues.apache.org/jira/browse/IMPALA-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18020328#comment-18020328
 ] 

ASF subversion and git services commented on IMPALA-13267:
----------------------------------------------------------

Commit 821c7347d1811a8cdeb78db10a9ee9730d98abc8 in impala's branch 
refs/heads/master from Noemi Pap-Takacs
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=821c7347d ]

IMPALA-13267: Display number of partitions for Iceberg tables

Before this change, query plans and profile reported only a single
partition even for partitioned Iceberg tables, which was misleading
for users.
Now we can display the number of scanned partitions correctly for
both partitioned and unpartitioned Iceberg tables. This is achieved by
extracting the partition values from the file descriptors and storing
them in the IcebergContentFileStore. Instead of storing this information
redundantly in all file descriptors, we store them in one place and
reference the partition metadata in the FDs with an id.
This also gives the opportunity to optimize memory consumption in the
Catalog and Coordinator as well as reduce network traffic between them
in the future.

Time travel is handled similarly to oldFileDescMap. In that case
we don't know the total number of partitions in the old snapshot,
so the output is [Num scanned partitions]/unknown.

Testing:
 - Planner tests
 - E2E tests
   - partition transforms
   - partition evolution
   - DROP PARTITION
   - time travel

Change-Id: Ifb2f654bc6c9bdf9cfafc27b38b5ca2f7b6b4872
Reviewed-on: http://gerrit.cloudera.org:8080/23113
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Number of partition is always 1 for Iceberg tables
> --------------------------------------------------
>
>                 Key: IMPALA-13267
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13267
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Gabor Kaszab
>            Assignee: Noémi Pap-Takács
>            Priority: Major
>              Labels: impala-iceberg
>
> Impala in general sees the Iceberg tables as unpartitioned and let's the 
> partitioning happen within Iceberg. As a result the query profiles for the 
> SCANs can be misleading when the number of partitions show always 1/1.
> We should either fix this or write 'N/A' or such instead of 1/1.
> Fixing might not be that straightforward because when planning the query 
> Iceberg gives us a list of files to read but we don't know how they are 
> aligned in terms of partitions. So we might have to do the files vs 
> partitions matching ourselves.
> Wondering if we can enhance Iceberg ScanMetrics so that it not just hold the 
> number of files, but the number of partitions too and then we can simply use 
> this metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to