rdblue commented on a change in pull request #828:
URL: https://github.com/apache/iceberg/pull/828#discussion_r434903523
##########
File path: spark/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -310,26 +379,27 @@ private static void mergeIcebergHadoopConfs(
@Override
public String toString() {
return String.format(
- "IcebergScan(table=%s, type=%s, filters=%s, caseSensitive=%s)",
- table, lazySchema().asStruct(), filterExpressions, caseSensitive);
+ "IcebergScan(table=%s, type=%s, filters=%s, caseSensitive=%s,
batchedReads=%s)",
+ table, lazySchema().asStruct(), filterExpressions, caseSensitive,
enableBatchRead());
}
- private static class ReadTask implements InputPartition<InternalRow>,
Serializable {
- private final CombinedScanTask task;
+ @SuppressWarnings("checkstyle:VisibilityModifier")
Review comment:
I think this is correct. We don't need to have multiple task instances,
especially since this will go away in 3.0.
Instead, it's cleaner if we pass a reader factory into a single read task
and call that factory in `createPartitionReader`:
```java
private interface ReaderFactory<T> {
InputPartitionReader<T> create(CombinedScanTask task, Schema
tableSchema, Schema expectedSchema, FileIO io,
EncryptionManager encryptionManager,
boolean caseSensitive);
}
private static class InternalRowReaderFactory implements
ReaderFactory<InternalRow> {
private static final InternalRowReaderFactory INSTANCE = new
InternalRowReaderFactory();
private InternalRowReaderFactory() {
}
@Override
public InputPartitionReader<InternalRow> create(CombinedScanTask task,
Schema tableSchema, Schema expectedSchema,
FileIO io,
EncryptionManager encryptionManager,
boolean caseSensitive) {
return new RowDataReader(task, tableSchema, expectedSchema, io,
encryptionManager, caseSensitive);
}
}
private static class BatchReaderFactory implements
ReaderFactory<ColumnarBatch> {
private final int batchSize;
BatchReaderFactory(int batchSize) {
this.batchSize = batchSize;
}
@Override
public InputPartitionReader<ColumnarBatch> create(CombinedScanTask task,
Schema tableSchema, Schema expectedSchema,
FileIO io,
EncryptionManager encryptionManager,
boolean caseSensitive) {
return new BatchDataReader(task, expectedSchema, io,
encryptionManager, caseSensitive, batchSize);
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]