[
https://issues.apache.org/jira/browse/DRILL-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aman Sinha updated DRILL-6852:
------------------------------
Labels: ready-to-commit (was: )
> Adapt current Parquet Metadata cache implementation to use Drill Metastore API
> ------------------------------------------------------------------------------
>
> Key: DRILL-6852
> URL: https://issues.apache.org/jira/browse/DRILL-6852
> Project: Apache Drill
> Issue Type: Sub-task
> Reporter: Volodymyr Vysotskyi
> Assignee: Volodymyr Vysotskyi
> Priority: Major
> Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> According to the design document for DRILL-6552, existing metadata cache API
> should be adapted to use generalized API for metastore and parquet metadata
> cache will be presented as the implementation of metastore API.
> The aim of this Jira is to refactor Parquet Metadata cache implementation and
> adapt it to use Drill Metastore API.
> Execution plan:
> - Refactor AbstractParquetGroupScan and its implementations to use metastore
> metadata classes. Store Drill data types in metadata files for Parquet tables.
> - Storing the least restrictive type instead of current first file’s column
> data type.
> - Rework logic in AbstractParquetGroupScan to allow filtering at different
> metadata layers: partition, file, row group, etc. The same for pushing the
> limit.
> - Implement logic to convert existing parquet metadata to metastore metadata
> to preserve backward compatibility.
> - Implement fetching metadata only when it is needed (for filtering, limit,
> count(*) etc.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)