[jira] [Updated] (DRILL-6852) Adapt current Parquet Metadata cache implementation to use Drill Metastore API

Volodymyr Vysotskyi (JIRA) Mon, 28 Jan 2019 10:34:26 -0800


     [ 
https://issues.apache.org/jira/browse/DRILL-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Volodymyr Vysotskyi updated DRILL-6852:
---------------------------------------
    Description: 
According to the design document for DRILL-6552, existing metadata cache API 
should be adapted to use generalized API for metastore and parquet metadata 
cache will be presented as the implementation of metastore API.

The aim of this Jira is to refactor Parquet Metadata cache implementation and 
adapt it to use Drill Metastore API.

Execution plan:
 - Refactor AbstractParquetGroupScan and its implementations to use metastore 
metadata classes. Store Drill data types in metadata files for Parquet tables.
 - Storing the least restrictive type instead of current first file’s column 
data type.
 - Rework logic in AbstractParquetGroupScan to allow filtering at different 
metadata layers: partition, file, row group, etc. The same for pushing the 
limit.
 - Implement logic to convert existing parquet metadata to metastore metadata 
to preserve backward compatibility.
 - Implement fetching metadata only when it is needed (for filtering, limit, 
count(*) etc.)

  was:
According to the design document for DRILL-6552, existing metadata cache API 
should be adapted to use generalized API for metastore and parquet metadata 
cache will be presented as the implementation of metastore API.

The aim of this Jira is to refactor Parquet Metadata cache implementation and 
adapt it to use Drill Metastore API.

Execution plan:
 - Refactor AbstractParquetGroupScan and its implementations to use metastore 
metadata classes. Store Drill data types in metadata files for Parquet tables.
 - Storing the least restrictive type instead of current first file’s column 
data type.
 - Rework logic in AbstractParquetGroupScan to allow filtering at different 
metadata layers: partition, file, row group, etc. The same for pushing the 
limit.
 - Implement logic to convert existing parquet metadata to metastore metadata 
to preserve backward compatibility.


> Adapt current Parquet Metadata cache implementation to use Drill Metastore API
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-6852
>                 URL: https://issues.apache.org/jira/browse/DRILL-6852
>             Project: Apache Drill
>          Issue Type: Sub-task
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Volodymyr Vysotskyi
>            Priority: Major
>             Fix For: 1.16.0
>
>
> According to the design document for DRILL-6552, existing metadata cache API 
> should be adapted to use generalized API for metastore and parquet metadata 
> cache will be presented as the implementation of metastore API.
> The aim of this Jira is to refactor Parquet Metadata cache implementation and 
> adapt it to use Drill Metastore API.
> Execution plan:
>  - Refactor AbstractParquetGroupScan and its implementations to use metastore 
> metadata classes. Store Drill data types in metadata files for Parquet tables.
>  - Storing the least restrictive type instead of current first file’s column 
> data type.
>  - Rework logic in AbstractParquetGroupScan to allow filtering at different 
> metadata layers: partition, file, row group, etc. The same for pushing the 
> limit.
>  - Implement logic to convert existing parquet metadata to metastore metadata 
> to preserve backward compatibility.
>  - Implement fetching metadata only when it is needed (for filtering, limit, 
> count(*) etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6852) Adapt current Parquet Metadata cache implementation to use Drill Metastore API

Reply via email to