[ 
https://issues.apache.org/jira/browse/IMPALA-7533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976100#comment-16976100
 ] 

Antoni Ivanov commented on IMPALA-7533:
---------------------------------------

Hi, 

To clarify my question. Say we have following cases
 * case 1: 
_alter table add partition(part_key=new_value)_

 * case 2: 
partition exists and we add new files to it 
 ** case 2.1 - added through spark or hive and do _refresh_
 ** case 2.2 - added with _insert_ statement in Impala

 * case 3: 
_alter table partition set location '/user/hive/.../newlocation'_ 

* case 4:

In which cases will the file and partition metadata be fully invalidated 
forcing full reload from Catalog ?
I suspect 1 and 3, is that right?
 

> Optimize fetch-from-catalog by caching partitions across table versions
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-7533
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7533
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Priority: Major
>              Labels: catalog-v2
>
> Currently, the cached partition-level information in CatalogdMetaProvider is 
> tied to a particular version number of its containing table. This means that 
> if the table is modified in any way (eg even a comment changes) all of the 
> partitions are effectively invalidated and need to be re-loaded from catalogd.
> We could avoid this invalidation-and-refetch in a couple ways:
> 1) make partitions immutable given an ID. Instead of modifying partitions in 
> place, we could drop the partition and add a new one with a new ID. This is 
> already done in several code paths, but not all. If we did this, then we'd 
> just need to invalidate the partition _list_ for a table, and when we fetched 
> the new list, we'd see which partitions changed and need to be reloaded.
> 2) add a partition-level version/sequence number which is modified whenever 
> the partition is mutated in place. If we fetched that as part of the 
> partition list, and used it as part of the cache key, we could avoid 
> invalidating partitions when nothing changed. This would have the cost of 4 
> or 8 bytes per partition (perhaps manageable considering the hundreds of 
> bytes saved by recent patches)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to