[ 
https://issues.apache.org/jira/browse/DRILL-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216134#comment-17216134
 ] 

ASF GitHub Bot commented on DRILL-7763:
---------------------------------------

vvysotskyi commented on a change in pull request #2092:
URL: https://github.com/apache/drill/pull/2092#discussion_r507046883



##########
File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/metastore/TestMetastoreWithEasyFormatPlugin.java
##########
@@ -1079,15 +1079,17 @@ public void testFilesPruningWithLimit() throws 
Exception {
       queryBuilder()
           .sql("select * from dfs.tmp.`%s` limit 1", tableName)
           .planMatcher()
-          .include("Limit", "numFiles=1,")
+          .include("Limit", "numFiles=1", "maxRecords=1")
           .match();
 
       // each file has 10 records, so 3 files should be picked
       queryBuilder()
           .sql("select * from dfs.tmp.`%s` limit 21", tableName)
           .planMatcher()
-          .include("Limit", "numFiles=3")
+          .include("Limit", "numFiles=3", "maxRecords=21")
           .match();
+      String plan = queryBuilder().sql("select * from dfs.tmp.`%s` limit 21", 
tableName).explainText();
+      System.out.println(plan);

Review comment:
       Please avoid printing to stdout.

##########
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
##########
@@ -451,19 +456,20 @@ public boolean supportsLimitPushdown() {
 
   @Override
   public GroupScan applyLimit(int maxRecords) {
-    maxRecords = Math.max(maxRecords, 1); // Make sure it request at least 1 
row -> 1 file.
+    this.maxRecords = maxRecords;

Review comment:
       This method shouldn't change the state of this group scan. It should 
return a new one required updates.

##########
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyGroupScan.java
##########
@@ -17,31 +17,30 @@
  */
 package org.apache.drill.exec.store.dfs.easy;
 
-import java.io.IOException;
-import java.util.ArrayList;
-import java.util.List;
-
+import com.fasterxml.jackson.annotation.JacksonInject;

Review comment:
       IDE can be configured to avoid changing the import order. Until we don't 
have a defined check style for it, it is better to avoid changing it.

##########
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyGroupScan.java
##########
@@ -387,7 +406,7 @@ public AnalyzeInfoProvider getAnalyzeInfoProvider() {
       newScan.tableMetadata = tableMetadata;
       // updates common row count and nulls counts for every column
       if (newScan.getTableMetadata() != null && files != null && 
newScan.getFilesMetadata().size() != files.size()) {
-        newScan.tableMetadata = 
TableMetadataUtils.updateRowCount(newScan.getTableMetadata(), files.values());
+        newScan.tableMetadata = 
TableMetadataUtils.updateRowCount(newScan.getTableMetadata(), files.values()); 
// TODO If limit is pushed down, adjust here...

Review comment:
       Could you please address this TODO?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Limit Pushdown to File Based Storage Plugins
> ------------------------------------------------
>
>                 Key: DRILL-7763
>                 URL: https://issues.apache.org/jira/browse/DRILL-7763
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.17.0
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.19.0
>
>
> As currently implemented, when querying a file, Drill will read the entire 
> file even if a limit is specified in the query.  This PR does a few things:
>  # Refactors the EasyGroupScan, EasySubScan, and EasyFormatConfig to allow 
> the option of pushing down limits.
>  # Applies this to all the EVF based format plugins which are: LogRegex, 
> PCAP, SPSS, Esri, Excel and Text (CSV). 
> Due to JSON's fluid schema, it would be unwise to adopt the limit pushdown as 
> it could result in very inconsistent schemata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to