[GitHub] tajo pull request: TAJO-2073: Upgrade parquet-mr to 1.8.1.

jihoonson Thu, 11 Feb 2016 19:24:12 -0800

Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/958#discussion_r52702202
  
    --- Diff: 
tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/thirdparty/parquet/InternalParquetRecordReader.java
 ---
    @@ -70,37 +81,50 @@
       private long totalCountLoadedSoFar = 0;
     
       private Path file;
    +  private UnmaterializableRecordCounter unmaterializableRecordCounter;
    +
    +  /**
    +   * @param readSupport Object which helps reads files of the given type, 
e.g. Thrift, Avro.
    +   * @param filter for filtering individual records
    +   */
    +  public InternalParquetRecordReader(ReadSupport<T> readSupport, Filter 
filter) {
    +    this.readSupport = readSupport;
    +    this.filter = checkNotNull(filter, "filter");
    +  }
     
       /**
        * @param readSupport Object which helps reads files of the given type, 
e.g. Thrift, Avro.
        */
       public InternalParquetRecordReader(ReadSupport<T> readSupport) {
    -    this(readSupport, null);
    +    this(readSupport, FilterCompat.NOOP);
       }
     
       /**
        * @param readSupport Object which helps reads files of the given type, 
e.g. Thrift, Avro.
        * @param filter Optional filter for only returning matching records.
    +   * @deprecated use {@link #InternalParquetRecordReader(ReadSupport, 
Filter)}
        */
    -  public InternalParquetRecordReader(ReadSupport<T> readSupport, 
UnboundRecordFilter
    -      filter) {
    -    this.readSupport = readSupport;
    -    this.recordFilter = filter;
    +  @Deprecated
    +  public InternalParquetRecordReader(ReadSupport<T> readSupport, 
UnboundRecordFilter filter) {
    +    this(readSupport, FilterCompat.get(filter));
       }
     
       private void checkRead() throws IOException {
         if (current == totalCountLoadedSoFar) {
           if (current != 0) {
    -        long timeAssembling = System.currentTimeMillis() - 
startedAssemblingCurrentBlockAt;
    -        totalTimeSpentProcessingRecords += timeAssembling;
    -        if (DEBUG) LOG.debug("Assembled and processed " + 
totalCountLoadedSoFar + " records from " + columnCount + " columns in " + 
totalTimeSpentProcessingRecords + " ms: " + ((float) totalCountLoadedSoFar / 
totalTimeSpentProcessingRecords) + " rec/ms, " + ((float) totalCountLoadedSoFar 
* columnCount / totalTimeSpentProcessingRecords) + " cell/ms");
    -        long totalTime = totalTimeSpentProcessingRecords + 
totalTimeSpentReadingBytes;
    -        long percentReading = 100 * totalTimeSpentReadingBytes / totalTime;
    -        long percentProcessing = 100 * totalTimeSpentProcessingRecords / 
totalTime;
    -        if (DEBUG) LOG.debug("time spent so far " + percentReading + "% 
reading ("+totalTimeSpentReadingBytes+" ms) and " + percentProcessing + "% 
processing ("+totalTimeSpentProcessingRecords+" ms)");
    +        totalTimeSpentProcessingRecords += (System.currentTimeMillis() - 
startedAssemblingCurrentBlockAt);
    +        if (Log.INFO) {
    --- End diff --
    
    Even though these logs seem to be printed whenever a row group is fully 
read, I'm concerned with there will be too many logs.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2073: Upgrade parquet-mr to 1.8.1.

Reply via email to