[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

ASF GitHub Bot (JIRA) Wed, 25 Apr 2018 01:51:14 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451899#comment-16451899
 ]


ASF GitHub Bot commented on DRILL-6331:
---------------------------------------

Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1214#discussion_r183982099
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/RowGroupInfo.java
 ---
    @@ -0,0 +1,95 @@
    +/*
    +* Licensed to the Apache Software Foundation (ASF) under one or more
    +* contributor license agreements.  See the NOTICE file distributed with
    +* this work for additional information regarding copyright ownership.
    +* The ASF licenses this file to you under the Apache License, Version 2.0
    +* (the "License"); you may not use this file except in compliance with
    +* the License.  You may obtain a copy of the License at
    +*
    +* http://www.apache.org/licenses/LICENSE-2.0
    +*
    +* Unless required by applicable law or agreed to in writing, software
    +* distributed under the License is distributed on an "AS IS" BASIS,
    +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +* See the License for the specific language governing permissions and
    +* limitations under the License.
    +*/
    +package org.apache.drill.exec.store.parquet;
    +
    +import com.fasterxml.jackson.annotation.JsonCreator;
    +import com.fasterxml.jackson.annotation.JsonProperty;
    +import org.apache.drill.exec.store.dfs.ReadEntryFromHDFS;
    +import org.apache.drill.exec.store.dfs.easy.FileWork;
    +import org.apache.drill.exec.store.schedule.CompleteWork;
    +import org.apache.drill.exec.store.schedule.EndpointByteMap;
    +
    +import java.util.List;
    +
    +import static 
org.apache.drill.exec.store.parquet.metadata.MetadataBase.ColumnMetadata;
    +
    +public class RowGroupInfo extends ReadEntryFromHDFS implements 
CompleteWork, FileWork {
    +
    +    private EndpointByteMap byteMap;
    +    private int rowGroupIndex;
    +    private List<? extends ColumnMetadata> columns;
    +    private long rowCount;  // rowCount = -1 indicates to include all rows.
    +    private long numRecordsToRead;
    +
    +    @JsonCreator
    +    public RowGroupInfo(@JsonProperty("path") String path, 
@JsonProperty("start") long start,
    +                        @JsonProperty("length") long length, 
@JsonProperty("rowGroupIndex") int rowGroupIndex, long rowCount) {
    +      super(path, start, length);
    +      this.rowGroupIndex = rowGroupIndex;
    +      this.rowCount = rowCount;
    +      this.numRecordsToRead = rowCount;
    +    }
    +
    +    public RowGroupReadEntry getRowGroupReadEntry() {
    +      return new RowGroupReadEntry(this.getPath(), this.getStart(), 
this.getLength(),
    +                                   this.rowGroupIndex, 
this.getNumRecordsToRead());
    +    }
    +
    +    public int getRowGroupIndex() {
    +      return this.rowGroupIndex;
    +    }
    +
    +    @Override
    +    public int compareTo(CompleteWork o) {
    +      return Long.compare(getTotalBytes(), o.getTotalBytes());
    +    }
    +
    +    @Override
    +    public long getTotalBytes() {
    +      return this.getLength();
    +    }
    +
    +    @Override
    +    public EndpointByteMap getByteMap() {
    +      return byteMap;
    +    }
    +
    +    public long getNumRecordsToRead() {
    +      return numRecordsToRead;
    +    }
    +
    +    public void setNumRecordsToRead(long numRecords) {
    +      numRecordsToRead = numRecords;
    +    }
    +
    +    public void setEndpointByteMap(EndpointByteMap byteMap) {
    +      this.byteMap = byteMap;
    +    }
    +
    +    public long getRowCount() {
    +      return rowCount;
    +    }
    +
    +    public List<? extends ColumnMetadata> getColumns() {
    +      return columns;
    +    }
    +
    +    public void setColumns(List<? extends ColumnMetadata> columns) {
    +      this.columns = columns;
    +    }
    +
    +  }
    --- End diff --
    
    Done.


> Parquet filter pushdown does not support the native hive reader
> ---------------------------------------------------------------
>
>                 Key: DRILL-6331
>                 URL: https://issues.apache.org/jira/browse/DRILL-6331
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Hive
>    Affects Versions: 1.13.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Initially HiveDrillNativeParquetGroupScan was based mainly on HiveScan, the 
> core difference between them was
> that HiveDrillNativeParquetScanBatchCreator was creating ParquetRecordReader 
> instead of HiveReader.
> This allowed to read Hive parquet files using Drill native parquet reader but 
> did not expose Hive data to Drill optimizations.
> For example, filter push down, limit push down, count to direct scan 
> optimizations.
> Hive code had to be refactored to use the same interfaces as 
> ParquestGroupScan in order to be exposed to such optimizations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6331) Parquet filter pushdown does not support the native hive reader

Reply via email to