[jira] [Comment Edited] (HIVE-11500) implement file footer / splits cache in HBase metastore

Sergey Shelukhin (JIRA) Fri, 14 Aug 2015 13:10:18 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697664#comment-14697664
 ]


Sergey Shelukhin edited comment on HIVE-11500 at 8/14/15 8:09 PM:
------------------------------------------------------------------

Actually the main reason all these calls exist for partitions is because they 
use args instead of request-response pattern, which makes it impossible to 
change the signature in a backward-compatible manner. I will happily refactor 
the newly added calls to be generic (req/resp should allow for that), or 
deprecate them in favor of generic calls and remove later, if the need arises. 


was (Author: sershe):
Actually the main reason all these calls exist for partitions is because they 
use args instead of request-response pattern, which makes it impossible to 
change the signature in a backward-compatible manner. I will happily refactor 
these calls to be generic, or deprecate them in favor of generic calls and 
remove later, if the need arises. 

> implement file footer / splits cache in HBase metastore
> -------------------------------------------------------
>
>                 Key: HIVE-11500
>                 URL: https://issues.apache.org/jira/browse/HIVE-11500
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Metastore
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HBase metastore split cache.pdf
>
>
> We need to cache file metadata (e.g. ORC file footers) for split generation 
> (which, on FSes that support fileId, will be valid permanently and only needs 
> to be removed lazily when ORC file is erased or compacted), and potentially 
> even some information about splits (e.g. grouping based on location that 
> would be good for some short time), in HBase metastore.
> -It should be queryable by table. Partition predicate pushdown should be 
> supported. If bucket pruning is added, that too.- Given that we cannot cache 
> file lists (we have to check FS for new/changed files anyway), and the 
> difficulty of passing of data about partitions/etc. to split generation 
> compared to paths, we will probably just filter by paths and fileIds. It 
> might be different for splits
> In later phases, it would be nice to save the (first category above) results 
> of expensive work done by jobs, e.g. data size after decompression/decoding 
> per column, etc. to avoid surprises when ORC encoding is very good, or very 
> bad. Perhaps it can even be lazily generated. Here's a pony: 🐴



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-11500) implement file footer / splits cache in HBase metastore

Reply via email to