[ 
https://issues.apache.org/jira/browse/DRILL-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742549#comment-14742549
 ] 

ASF GitHub Bot commented on DRILL-3735:
---------------------------------------

GitHub user amansinha100 opened a pull request:

    https://github.com/apache/drill/pull/156

    DRILL-3735: For partition pruning divide up the partition lists into …

    …sublists of 64K each and iterate over each sublist.
    
    Add abstract base class for various partition descriptors.  Add logging 
messages in PruneScanRule for better debuggability.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/amansinha100/incubator-drill partition9

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #156
    
----
commit dc079ad2cfa2813817564cc8bdd66356d0c6e59c
Author: Aman Sinha <[email protected]>
Date:   2015-09-12T19:57:12Z

    DRILL-3735: For partition pruning divide up the partition lists into 
sublists of 64K each and iterate over each sublist.
    
    Add abstract base class for various partition descriptors.  Add logging 
messages in PruneScanRule for better debuggability.

----


> Directory pruning is not happening when number of files is larger than 64k
> --------------------------------------------------------------------------
>
>                 Key: DRILL-3735
>                 URL: https://issues.apache.org/jira/browse/DRILL-3735
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.1.0
>            Reporter: Hao Zhu
>            Assignee: Aman Sinha
>             Fix For: 1.2.0
>
>
> When the number of files is larger than 64k limit, directory pruning is not 
> happening. 
> We need to increase this limit further to handle most use cases.
> My proposal is to separate the code for directory pruning and partition 
> pruning. 
> Say in a parent directory there are 100 directories and 1 million files.
> If we only query the file from one directory, we should firstly read the 100 
> directories and narrow down to which directory; and then read the file paths 
> in that directory in memory and do the rest stuff.
> Current behavior is , Drill will read all the file paths of that 1 million 
> files in memory firstly, and then do directory pruning or partition pruning. 
> This is not performance efficient nor memory efficient. And also it can not 
> scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to