[GitHub] [iceberg] rdblue commented on a change in pull request #828: iceberg-spark changes for vectorized reads

GitBox Wed, 03 Jun 2020 16:01:37 -0700


rdblue commented on a change in pull request #828:
URL: https://github.com/apache/iceberg/pull/828#discussion_r434903523




##########
File path: spark/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -310,26 +379,27 @@ private static void mergeIcebergHadoopConfs(
   @Override
   public String toString() {
     return String.format(
-        "IcebergScan(table=%s, type=%s, filters=%s, caseSensitive=%s)",
-        table, lazySchema().asStruct(), filterExpressions, caseSensitive);
+        "IcebergScan(table=%s, type=%s, filters=%s, caseSensitive=%s, 
batchedReads=%s)",
+        table, lazySchema().asStruct(), filterExpressions, caseSensitive, 
enableBatchRead());
   }
 
-  private static class ReadTask implements InputPartition<InternalRow>, 
Serializable {
-    private final CombinedScanTask task;
+  @SuppressWarnings("checkstyle:VisibilityModifier")

Review comment:
       I think this is correct. We don't need to have multiple task instances, 
especially since this will go away in 3.0.
   
   Instead, it's cleaner if we pass a reader factory into a single read task 
and call that factory in `createPartitionReader`:
   
   ```java
     private interface ReaderFactory<T> {
       InputPartitionReader<T> create(CombinedScanTask task, Schema 
tableSchema, Schema expectedSchema, FileIO io,
                                      EncryptionManager encryptionManager, 
boolean caseSensitive);
     }
   
     private static class InternalRowReaderFactory implements 
ReaderFactory<InternalRow> {
       private static final InternalRowReaderFactory INSTANCE = new 
InternalRowReaderFactory();
   
       private InternalRowReaderFactory() {
       }
   
       @Override
       public InputPartitionReader<InternalRow> create(CombinedScanTask task, 
Schema tableSchema, Schema expectedSchema,
                                                       FileIO io, 
EncryptionManager encryptionManager,
                                                       boolean caseSensitive) {
         return new RowDataReader(task, tableSchema, expectedSchema, io, 
encryptionManager, caseSensitive);
       }
     }
   
     private static class BatchReaderFactory implements 
ReaderFactory<ColumnarBatch> {
       private final int batchSize;
   
       BatchReaderFactory(int batchSize) {
         this.batchSize = batchSize;
       }
   
       @Override
       public InputPartitionReader<ColumnarBatch> create(CombinedScanTask task, 
Schema tableSchema, Schema expectedSchema,
                                                       FileIO io, 
EncryptionManager encryptionManager,
                                                       boolean caseSensitive) {
         return new BatchDataReader(task, expectedSchema, io, 
encryptionManager, caseSensitive, batchSize);
       }
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #828: iceberg-spark changes for vectorized reads

Reply via email to