[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

ASF GitHub Bot (JIRA) Wed, 25 Apr 2018 13:02:32 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452986#comment-16452986
 ]


ASF GitHub Bot commented on DRILL-6331:
---------------------------------------

Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1214#discussion_r184188427
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java 
---
    @@ -158,25 +159,26 @@ public void close() {
         } catch (RuntimeException e) {
           ex = ex == null ? e : ex;
         }
    -    try {
    -      if (fs != null) {
    +
    +    for (DrillFileSystem fs : fileSystems) {
    +      try {
             fs.close();
    -        fs = null;
    -      }
    -    } catch (IOException e) {
    +      } catch (IOException e) {
           throw UserException.resourceError(e)
    -        .addContext("Failed to close the Drill file system for " + 
getName())
    -        .build(logger);
    +          .addContext("Failed to close the Drill file system for " + 
getName())
    +          .build(logger);
    +      }
         }
    +
         if (ex != null) {
           throw ex;
         }
       }
     
       @Override
       public DrillFileSystem newFileSystem(Configuration conf) throws 
IOException {
    -    Preconditions.checkState(fs == null, "Tried to create a second 
FileSystem. Can only be called once per OperatorContext");
    -    fs = new DrillFileSystem(conf, getStats());
    +    DrillFileSystem fs = new DrillFileSystem(conf, getStats());
    --- End diff --
    
    @parthchandra this is definitely a good question. I did so because in 
previous code new fs was created for each Hive table split [1]. Projection 
pusher is used to define fs for each split, it resolves path for table 
partitions. Frankly saying it worked fine for me without it (all tests have 
passed) but in Hive code the same approach is used and apparently for the same 
reasons it was used in Drill. To be safe, I have done the same. If you think we 
can the same fs for each row group in Hive, then I can adjust the changes.
    
    [1] 
https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java#L112


> Parquet filter pushdown does not support the native hive reader
> ---------------------------------------------------------------
>
>                 Key: DRILL-6331
>                 URL: https://issues.apache.org/jira/browse/DRILL-6331
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Hive
>    Affects Versions: 1.13.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the 
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader 
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but 
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan 
> optimizations.
> Hive code had to be refactored to use the same interfaces as 
> ParquestGroupScan in order to be exposed to such optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

Reply via email to