[
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HIVE-11500:
------------------------------------
Attachment: (was: HBase metastore split cache.pdf)
> implement file footer / splits cache in HBase metastore
> -------------------------------------------------------
>
> Key: HIVE-11500
> URL: https://issues.apache.org/jira/browse/HIVE-11500
> Project: Hive
> Issue Type: Sub-task
> Components: Metastore
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HBase metastore split cache.pdf
>
>
> We need to cache file metadata (e.g. ORC file footers) for split generation
> (which, on FSes that support fileId, will be valid permanently and only needs
> to be removed lazily when ORC file is erased or compacted), and potentially
> even some information about splits (e.g. grouping based on location that
> would be good for some short time), in HBase metastore.
> -It should be queryable by table. Partition predicate pushdown should be
> supported. If bucket pruning is added, that too.- Given that we cannot cache
> file lists (we have to check FS for new/changed files anyway), and the
> difficulty of passing of data about partitions/etc. to split generation
> compared to paths, we will probably just filter by paths and fileIds. It
> might be different for splits
> In later phases, it would be nice to save the (first category above) results
> of expensive work done by jobs, e.g. data size after decompression/decoding
> per column, etc. to avoid surprises when ORC encoding is very good, or very
> bad. Perhaps it can even be lazily generated. Here's a pony: 🐴
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)