[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

parthchandra Fri, 22 Dec 2017 13:57:46 -0800

Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1060#discussion_r158549859
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VLAbstractEntryReader.java
 ---
    @@ -0,0 +1,215 @@
    
+/*******************************************************************************
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + 
******************************************************************************/
    +package org.apache.drill.exec.store.parquet.columnreaders;
    +
    +import java.nio.ByteBuffer;
    +
    +import 
org.apache.drill.exec.store.parquet.columnreaders.VLColumnBulkInput.ColumnPrecisionInfo;
    +import 
org.apache.drill.exec.store.parquet.columnreaders.VLColumnBulkInput.PageDataInfo;
    +import org.apache.drill.exec.util.MemoryUtils;
    +
    +/** Abstract class for sub-classes implementing several algorithms for 
loading a Bulk Entry */
    +abstract class VLAbstractEntryReader {
    +
    +  /** byte buffer used for buffering page data */
    +  protected final ByteBuffer buffer;
    +  /** Page Data Information */
    +  protected final PageDataInfo pageInfo;
    +  /** expected precision type: fixed or variable length */
    +  protected final ColumnPrecisionInfo columnPrecInfo;
    +  /** Bulk entry */
    +  protected final VLColumnBulkEntry entry;
    +
    +  /**
    +   * CTOR.
    +   * @param _buffer byte buffer for data buffering (within CPU cache)
    +   * @param _pageInfo page being processed information
    +   * @param _columnPrecInfo column precision information
    +   * @param _entry reusable bulk entry object
    +   */
    +  VLAbstractEntryReader(ByteBuffer _buffer,
    +    PageDataInfo _pageInfo,
    +    ColumnPrecisionInfo _columnPrecInfo,
    +    VLColumnBulkEntry _entry) {
    +
    +    this.buffer         = _buffer;
    +    this.pageInfo       = _pageInfo;
    +    this.columnPrecInfo = _columnPrecInfo;
    +    this.entry          = _entry;
    +  }
    +
    +  /**
    +   * @param valuesToRead maximum values to read within the current page
    +   * @return a bulk entry object
    +   */
    +  abstract VLColumnBulkEntry getEntry(int valsToReadWithinPage);
    +
    +  /**
    +   * Indicates whether to use bulk processing
    +   */
    +  protected final boolean bulkProcess() {
    +    return columnPrecInfo.bulkProcess;
    +  }
    +
    +  /**
    +   * Loads new data into the buffer if empty or the force flag is set.
    +   *
    +   * @param force flag to force loading new data into the buffer
    +   */
    +  protected final boolean load(boolean force) {
    +
    +    if (!force && buffer.hasRemaining()) {
    +      return true; // NOOP
    +    }
    +
    +    // We're here either because the buffer is empty or we want to force a 
new load operation.
    +    // In the case of force, there might be unprocessed data (still in the 
buffer) which is fine
    +    // since the caller updates the page data buffer's offset only for the 
data it has consumed; this
    +    // means unread data will be loaded again but this time will be 
positioned in the beginning of the
    +    // buffer. This can happen only for the last entry in the buffer when 
either of its length or value
    +    // is incomplete.
    +    buffer.clear();
    +
    +    int remaining = remainingPageData();
    +    int toCopy    = remaining > buffer.capacity() ? buffer.capacity() : 
remaining;
    +
    +    if (toCopy == 0) {
    +      return false;
    +    }
    +
    +    pageInfo.pageData.getBytes(pageInfo.pageDataOff, buffer.array(), 
buffer.position(), toCopy);
    --- End diff --
    
    So seriously, this is faster? I would have expected the copy from direct to 
java heap memory to be a big issue. There are HDFS APIs to read into ByteBuffer 
(not DirectByteBuffer) that we could leverage and reduce the memory copy across 
 direct memory and Java heap memory.

---

[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

Reply via email to