[ 
https://issues.apache.org/jira/browse/IMPALA-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noémi Pap-Takács updated IMPALA-14564:
--------------------------------------
    Description: 
File descriptors store the partition information (spec id and partition keys). 
Depending on the partitioning, partition keys can consist of many string fields 
corresponding to the partition values. Storing these keys redundantly for each 
file descriptor object adds a large overhead both to catalogd's memory and to 
the serialized data (TIcebergTable.TIcebergContentFileStore) that the Catalog 
sends to the Coordinator.

Removing the partition info from file descriptors could significantly reduce 
their size.

The partition keys could be stored in a map (id - partition info) that gets 
sent along with the file descriptors and the values could be looked up using an 
id for each partition.

> Remove redundant partition information from  Iceberg file descriptors
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-14564
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14564
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog, Frontend
>            Reporter: Noémi Pap-Takács
>            Assignee: Noémi Pap-Takács
>            Priority: Major
>              Labels: impala-iceberg
>
> File descriptors store the partition information (spec id and partition 
> keys). Depending on the partitioning, partition keys can consist of many 
> string fields corresponding to the partition values. Storing these keys 
> redundantly for each file descriptor object adds a large overhead both to 
> catalogd's memory and to the serialized data 
> (TIcebergTable.TIcebergContentFileStore) that the Catalog sends to the 
> Coordinator.
> Removing the partition info from file descriptors could significantly reduce 
> their size.
> The partition keys could be stored in a map (id - partition info) that gets 
> sent along with the file descriptors and the values could be looked up using 
> an id for each partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to