[GitHub] [iceberg] shardulm94 commented on a change in pull request #2281: [SPARK] Add in support to read timestamp without timezone from parquet files

GitBox Sat, 27 Feb 2021 16:38:28 -0800


shardulm94 commented on a change in pull request #2281:
URL: https://github.com/apache/iceberg/pull/2281#discussion_r584210600




##########
File path: 
spark3/src/main/java/org/apache/iceberg/spark/source/SparkBatchScan.java
##########
@@ -150,24 +169,39 @@ public PartitionReaderFactory createReaderFactory() {
                 .allMatch(fileScanTask -> fileScanTask.file().format().equals(
                     FileFormat.ORC)));
 
+    boolean hasNoRowFilters =
+        tasks().stream()
+            .allMatch(combinedScanTask -> !combinedScanTask.isDataTask() && 
combinedScanTask.files()
+                .stream()
+                .allMatch(fileScanTask -> 
OrcRowFilterUtils.rowFilterFromTask(fileScanTask) == null));

Review comment:
       This code was also not touched by 
https://github.com/linkedin/iceberg/pull/48 and does not exist in 
apache/iceberg. Can you remove this?
   
   These are several other changes in the files which are linkedin specific and 
were not touched by https://github.com/linkedin/iceberg/pull/48

##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -136,7 +143,7 @@
     if (io.getValue() instanceof HadoopFileIO) {
       String fsscheme = "no_exist";
       try {
-        Configuration conf = 
SparkSession.active().sessionState().newHadoopConf();
+        Configuration conf = new 
Configuration(activeSparkSession().sessionState().newHadoopConf());

Review comment:
       Many changes in this file seems to be copied over from LinkedIn's fork 
which are not relevant to apache/iceberg. Can you remove these?
   
   The PR over linkedin/iceberg does not have these changes either. So not sure 
how those were copied over.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] shardulm94 commented on a change in pull request #2281: [SPARK] Add in support to read timestamp without timezone from parquet files

Reply via email to