[GitHub] [drill] vdiravka commented on a change in pull request #2143: DRILL-7825: Unknown logical type in Parquet

GitBox Sat, 17 Apr 2021 09:16:55 -0700


vdiravka commented on a change in pull request #2143:
URL: https://github.com/apache/drill/pull/2143#discussion_r614836457




##########
File path: pom.xml
##########
@@ -47,9 +47,9 @@
     <junit.version>4.12</junit.version>
     <slf4j.version>1.7.26</slf4j.version>
     <shaded.guava.version>28.2-jre</shaded.guava.version>
-    <guava.version>19.0</guava.version>
+    <guava.version>19.0</guava.version> <!--todo: 28.2-jre guava can be used 
here-->

Review comment:
       WIP: [DRILL-7904: Update to 30-jre Guava 
version](https://issues.apache.org/jira/browse/DRILL-7904)

##########
File path: 
exec/java-exec/src/main/java/org/apache/parquet/hadoop/ParquetColumnChunkPageWriteStore.java
##########
@@ -260,14 +260,16 @@ public long getMemSize() {
     }
 
     /**
-     * Writes a number of pages within corresponding column chunk
+     * Writes a number of pages within corresponding column chunk <br>
+     * // TODO: the Bloom Filter can be useful in filtering entire row groups,
+     *     see <a 
href="https://issues.apache.org/jira/browse/DRILL-7895";>DRILL-7895</a>

Review comment:
       I double checked Parquet `ColumnChunkPageWriteStore` and looks like we 
still use `ParquetDirectByteBufferAllocator` and allocate `DrillBuf` due to 
initializing `ParquetProperties` with proper allocator (see 
`ParquetRecordWriter`#`258`). I also debug 
`TestParquetWriter.testTPCHReadWriteRunRepeated` test case and found that Drill 
allocates the same memory for `byte[]` in Heap with `ColumnChunkPageWriteStore` 
and old `ParquetColumnChunkPageWriteStore` (~50% for my default settings).
   So we can update `ParquetRecordWriter` with `ColumnChunkPageWriteStore`

##########
File path: exec/jdbc-all/pom.xml
##########
@@ -575,7 +575,7 @@
 
         <build>
           <plugins>
-            <plugin>
+            <plugin> <!-- TODO: this plugin has common things with default 
profile. Factor out this common things to avoid duplicate code -->

Review comment:
       `maven-enforcer-plugin` is removed from `mapr` profile, because there is 
fully the same plugin in default scope.
   There is also very similar `maven-shade-plugin`, but there are some 
differences. So before merging this plugin it is better to check it on `mapr` 
cluster, I think.

##########
File path: pom.xml
##########
@@ -47,9 +47,9 @@
     <junit.version>4.12</junit.version>
     <slf4j.version>1.7.26</slf4j.version>
     <shaded.guava.version>28.2-jre</shaded.guava.version>
-    <guava.version>19.0</guava.version>
+    <guava.version>19.0</guava.version> <!--todo: 28.2-jre guava can be used 
here-->

Review comment:
       WIP: [DRILL-7904: Update to 30-jre Guava 
version](https://issues.apache.org/jira/browse/DRILL-7904)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] vdiravka commented on a change in pull request #2143: DRILL-7825: Unknown logical type in Parquet

Reply via email to