[
https://issues.apache.org/jira/browse/IMPALA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe McDonnell resolved IMPALA-8630.
-----------------------------------
Resolution: Fixed
Fix Version/s: Impala 3.3.0
> Consistent remote placement should include partition information when
> calculating placement
> -------------------------------------------------------------------------------------------
>
> Key: IMPALA-8630
> URL: https://issues.apache.org/jira/browse/IMPALA-8630
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.2.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Blocker
> Fix For: Impala 3.3.0
>
>
> For partitioned tables, the actual filenames within partitions may not have
> large entropy. Impala includes information in its filenames that would not be
> the same across partitions, but this is common for tables written by the
> current CDH version of Hive. For example, in our minicluster, the TPC-DS
> store_sales table has many partitions, but the actual filenames within
> partitions are very simple:
> {noformat}
> hdfs dfs -ls /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452642
> Found 1 items
> -rwxr-xr-x 3 joe supergroup 379535 2019-06-05 15:16
> /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452642/000000_0
> hdfs dfs -ls /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452640
> Found 1 items
> -rwxr-xr-x 3 joe supergroup 412959 2019-06-05 15:16
> /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452640/000000_0{noformat}
> Right now, consistent remote placement uses the filename+offset without the
> partition id.
> {code:java}
> uint32_t hash = HashUtil::Hash(hdfs_file_split->relative_path.data(),
> hdfs_file_split->relative_path.length(), 0);
> {code}
> This would produce a poor balance of files across nodes when there is low
> entropy in filenames. This should be amended to include the partition id,
> which is already accessible on the THdfsFileSplit.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)