[jira] [Commented] (DRILL-4053) Reduce metadata cache file size

ASF GitHub Bot (JIRA) Fri, 13 Nov 2015 17:22:26 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005022#comment-15005022
 ]


ASF GitHub Bot commented on DRILL-4053:
---------------------------------------

Github user StevenMPhillips commented on the pull request:

    https://github.com/apache/drill/pull/254#issuecomment-156603625
  
    Yes, that is true, it is able to deserialize based on version, similar to 
what we do when deserializing physical plans or storage plugin configurations. 
See StoragePluginConfig.java for an example. It uses the @JsonTypeInfo 
annotation.
     
    I wasn't sure if some other part of the code needs to know the version 
before deserializing.
    
    This was designed to support new versions.


> Reduce metadata cache file size
> -------------------------------
>
>                 Key: DRILL-4053
>                 URL: https://issues.apache.org/jira/browse/DRILL-4053
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Metadata
>    Affects Versions: 1.3.0
>            Reporter: Parth Chandra
>            Assignee: Parth Chandra
>             Fix For: 1.4.0
>
>
> The parquet metadata cache file has fair amount of redundant metadata that 
> causes the size of the cache file to bloat. Two things that we can reduce are 
> :
> 1) Schema is repeated for every row group. We can keep a merged schema 
> (similar to what was discussed for insert into functionality) 2) The max and 
> min value in the stats are used for partition pruning when the values are the 
> same. We can keep the maxValue only and that too only if it is the same as 
> the minValue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4053) Reduce metadata cache file size

Reply via email to