[
https://issues.apache.org/jira/browse/IMPALA-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792731#comment-16792731
]
Gabor Kaszab commented on IMPALA-7047:
--------------------------------------
Hey [~tlipcon],
Is there anything left from this task? Can this be resolved and the fix date be
set to 3.2?
c59f761 IMPALA-7047. Refreshing partitions should not make an RPC per file
> REFRESH on unpartitioned tables calls getBlockLocations on every file
> ---------------------------------------------------------------------
>
> Key: IMPALA-7047
> URL: https://issues.apache.org/jira/browse/IMPALA-7047
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 2.13.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
> Labels: metadata
>
> In HdfsTable.updateUnpartitionedTableFileMd() the existing default Partition
> object is reset, and a new empty one is created. It then calls
> refreshPartitionFileMetadata with this new partition which has an empty list
> of file descriptors. This ends up listing the directory, and for each file,
> since it doesn't find it in the empty descriptor list, will make a separate
> RPC to HDFS to get the locations.
> This is quite wasteful vs just using the API that returns the located
> statuses for the directory.
> Alternatively, it seems like it should probably keep around the old file
> descriptor list in the new Partition object so that the incremental refresh
> path can work.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]