Github user sachouche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1060#discussion_r160514755
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VLAbstractEntryReader.java
 ---
    @@ -0,0 +1,215 @@
    
+/*******************************************************************************
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + 
******************************************************************************/
    +package org.apache.drill.exec.store.parquet.columnreaders;
    +
    +import java.nio.ByteBuffer;
    +
    +import 
org.apache.drill.exec.store.parquet.columnreaders.VLColumnBulkInput.ColumnPrecisionInfo;
    +import 
org.apache.drill.exec.store.parquet.columnreaders.VLColumnBulkInput.PageDataInfo;
    +import org.apache.drill.exec.util.MemoryUtils;
    +
    +/** Abstract class for sub-classes implementing several algorithms for 
loading a Bulk Entry */
    +abstract class VLAbstractEntryReader {
    +
    +  /** byte buffer used for buffering page data */
    +  protected final ByteBuffer buffer;
    +  /** Page Data Information */
    +  protected final PageDataInfo pageInfo;
    +  /** expected precision type: fixed or variable length */
    +  protected final ColumnPrecisionInfo columnPrecInfo;
    +  /** Bulk entry */
    +  protected final VLColumnBulkEntry entry;
    +
    +  /**
    +   * CTOR.
    +   * @param _buffer byte buffer for data buffering (within CPU cache)
    +   * @param _pageInfo page being processed information
    +   * @param _columnPrecInfo column precision information
    +   * @param _entry reusable bulk entry object
    +   */
    +  VLAbstractEntryReader(ByteBuffer _buffer,
    +    PageDataInfo _pageInfo,
    +    ColumnPrecisionInfo _columnPrecInfo,
    +    VLColumnBulkEntry _entry) {
    +
    +    this.buffer         = _buffer;
    +    this.pageInfo       = _pageInfo;
    +    this.columnPrecInfo = _columnPrecInfo;
    +    this.entry          = _entry;
    +  }
    +
    +  /**
    +   * @param valuesToRead maximum values to read within the current page
    +   * @return a bulk entry object
    +   */
    +  abstract VLColumnBulkEntry getEntry(int valsToReadWithinPage);
    +
    +  /**
    +   * Indicates whether to use bulk processing
    +   */
    +  protected final boolean bulkProcess() {
    +    return columnPrecInfo.bulkProcess;
    +  }
    +
    +  /**
    +   * Loads new data into the buffer if empty or the force flag is set.
    +   *
    +   * @param force flag to force loading new data into the buffer
    +   */
    +  protected final boolean load(boolean force) {
    +
    +    if (!force && buffer.hasRemaining()) {
    +      return true; // NOOP
    +    }
    +
    +    // We're here either because the buffer is empty or we want to force a 
new load operation.
    +    // In the case of force, there might be unprocessed data (still in the 
buffer) which is fine
    +    // since the caller updates the page data buffer's offset only for the 
data it has consumed; this
    +    // means unread data will be loaded again but this time will be 
positioned in the beginning of the
    +    // buffer. This can happen only for the last entry in the buffer when 
either of its length or value
    +    // is incomplete.
    +    buffer.clear();
    +
    +    int remaining = remainingPageData();
    +    int toCopy    = remaining > buffer.capacity() ? buffer.capacity() : 
remaining;
    +
    +    if (toCopy == 0) {
    +      return false;
    +    }
    +
    +    pageInfo.pageData.getBytes(pageInfo.pageDataOff, buffer.array(), 
buffer.position(), toCopy);
    --- End diff --
    
    @parthchandra I know this is counter intuitive but this path makes it 
faster and I had VTune to prove it; let me explain it:
    - The logic is to fetch small chunks of data from main memory to the L1 
cache (using 4k buffers) for processing
    - This logic doesn't introduce any extra memory bandwidth penalty since in 
all cases data has to move from main memory to the CPU and back to main memory 
(if modified)
    - Note that the CPU has optimizations that disable memory flush for private 
mutated data which fully fit in the cache
    - This means the byte array read / write operations is always local 
(doesn't involve the memory bus)
    - Another topic could be any extra CPU instructions we might be using.. 
usually this is not a problem as long as in the end our logic is avoiding CPU 
stalls, better utilizing the CPU pipelining, and using more optimal 
instructions (e.g., 64bit vs 8bit, SIMD, etc)
    
    I would recommend using VTune as it clearly show cases the above assertions.


---

Reply via email to