[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

ASF GitHub Bot (JIRA) Wed, 25 Apr 2018 10:33:50 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452708#comment-16452708
 ]


ASF GitHub Bot commented on DRILL-6331:
---------------------------------------

Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1214#discussion_r183919198
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java 
---
    @@ -158,25 +159,26 @@ public void close() {
         } catch (RuntimeException e) {
           ex = ex == null ? e : ex;
         }
    -    try {
    -      if (fs != null) {
    +
    +    for (DrillFileSystem fs : fileSystems) {
    +      try {
             fs.close();
    -        fs = null;
    -      }
    -    } catch (IOException e) {
    +      } catch (IOException e) {
           throw UserException.resourceError(e)
    -        .addContext("Failed to close the Drill file system for " + 
getName())
    -        .build(logger);
    +          .addContext("Failed to close the Drill file system for " + 
getName())
    +          .build(logger);
    +      }
         }
    +
         if (ex != null) {
           throw ex;
         }
       }
     
       @Override
       public DrillFileSystem newFileSystem(Configuration conf) throws 
IOException {
    -    Preconditions.checkState(fs == null, "Tried to create a second 
FileSystem. Can only be called once per OperatorContext");
    -    fs = new DrillFileSystem(conf, getStats());
    +    DrillFileSystem fs = new DrillFileSystem(conf, getStats());
    --- End diff --
    
    I don't get why you need multiple DrillFileSystems per operator context? 
The reason for the DrillFileSystem abstraction (and the reason for tying it to 
the operator context) is to track the time a (scan) operator was waiting for a 
file system call to return. This is reported in the wait time for the operator 
in the query profile.  For scans this is a critical number as the time spent 
waiting for a disk read determines if the query is disk bound.
    Associating multiple file system objects with a single operator context 
will throw the math out of whack. I think.



> Parquet filter pushdown does not support the native hive reader
> ---------------------------------------------------------------
>
>                 Key: DRILL-6331
>                 URL: https://issues.apache.org/jira/browse/DRILL-6331
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Hive
>    Affects Versions: 1.13.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the 
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader 
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but 
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan 
> optimizations.
> Hive code had to be refactored to use the same interfaces as 
> ParquestGroupScan in order to be exposed to such optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

Reply via email to