[ 
https://issues.apache.org/jira/browse/DRILL-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942855#comment-14942855
 ] 

Aman Sinha commented on DRILL-3892:
-----------------------------------

I took a look at this.  The metadata file does get used and at first the 
usedMetadataFile is set to true.  Subsequently, after partition pruning, we 
call ParquetGroupScan.clone() to modify the file selection.  During this we 
call init() again and this time if there is a *single* file in the file 
selection, then we set usedMetadataFile = false, thus overwriting the previous 
setting.  I think once the flag has been set to true, it should not be changed. 
[~sphillips] does that sound ok ?  I can submit a patch for this. 

I don't think this is a critical issue because of 2 reasons:
 - The metadata file does get used with partition pruning but the bug is in 
updating the flag. 
 - It will only occur if the file selection is exactly 1;  [~rkins] you can 
confirm this by adding a few more files in 
   the partition folders.  If there are 2 or more selections after partition 
pruning, we go through a different code 
   path and the flag will be set correctly. 



> Metadata cache not being leveraged when partition pruning is taking place
> -------------------------------------------------------------------------
>
>                 Key: DRILL-3892
>                 URL: https://issues.apache.org/jira/browse/DRILL-3892
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.2.0
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: lineitem_deletecache.tgz
>
>
> git.commit.id.abbrev=92638dc
> As we can see from the below plan, metadata cache is not being leveraged even 
> when the cache file is being present
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata 
> dfs.`/drill/testdata/metadata_caching/lineitem_deletecache`;
> +-------+-------------------------------------------------------------------------------------------------+
> |  ok   |                                             summary                 
>                             |
> +-------+-------------------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table 
> /drill/testdata/metadata_caching/lineitem_deletecache.  |
> +-------+-------------------------------------------------------------------------------------------------+
> 1 row selected (0.402 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for select count(*) from 
> dfs.`/drill/testdata/metadata_caching/lineitem_deletecache` where dir0=2006 
> group by l_linestatus;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$1])
> 00-02        HashAgg(group=[{0}], EXPR$0=[COUNT()])
> 00-03          Project(l_linestatus=[$0])
> 00-04            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/metadata_caching/lineitem_deletecache/2006/1/lineitem_999.parquet]],
>  selectionRoot=maprfs:/drill/testdata/metadata_caching/lineitem_deletecache, 
> numFiles=1, usedMetadataFile=false, columns=[`l_linestatus`, `dir0`]]])
> {code}
> I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to