Todd Lipcon commented on IMPALA-7047:

On a relatively small table with 374 files, REFRESH spends about a second in 
this code path (each RPC is 2-3ms due to RTT).

> REFRESH on unpartitioned tables calls getBlockLocations on every file
> ---------------------------------------------------------------------
>                 Key: IMPALA-7047
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7047
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 2.13.0
>            Reporter: Todd Lipcon
>            Priority: Major
>              Labels: metadata
> In HdfsTable.updateUnpartitionedTableFileMd() the existing default Partition 
> object is reset, and a new empty one is created. It then calls 
> refreshPartitionFileMetadata with this new partition which has an empty list 
> of file descriptors. This ends up listing the directory, and for each file, 
> since it doesn't find it in the empty descriptor list, will make a separate 
> RPC to HDFS to get the locations.
> This is quite wasteful vs just using the API that returns the located 
> statuses for the directory.
> Alternatively, it seems like it should probably keep around the old file 
> descriptor list in the new Partition object so that the incremental refresh 
> path can work.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to