Todd Lipcon created IMPALA-7047:

             Summary: REFRESH on unpartitioned tables calls getBlockLocations 
on every file
                 Key: IMPALA-7047
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 2.13.0
            Reporter: Todd Lipcon

In HdfsTable.updateUnpartitionedTableFileMd() the existing default Partition 
object is reset, and a new empty one is created. It then calls 
refreshPartitionFileMetadata with this new partition which has an empty list of 
file descriptors. This ends up listing the directory, and for each file, since 
it doesn't find it in the empty descriptor list, will make a separate RPC to 
HDFS to get the locations.

This is quite wasteful vs just using the API that returns the located statuses 
for the directory.

Alternatively, it seems like it should probably keep around the old file 
descriptor list in the new Partition object so that the incremental refresh 
path can work.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to