rdblue commented on a change in pull request #828:
URL: https://github.com/apache/iceberg/pull/828#discussion_r434906760
##########
File path: spark/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -145,6 +154,14 @@
this.io = io;
this.encryptionManager = encryptionManager;
this.caseSensitive = caseSensitive;
+
+ boolean enableBatchReadsConfig =
+
options.get("iceberg.read.parquet-vectorization.enabled").map(Boolean::parseBoolean).orElse(true);
+ if (!enableBatchReadsConfig) {
+ enableBatchRead = Boolean.FALSE;
+ }
+ Optional<String> numRecordsPerBatchOpt =
options.get("iceberg.read.parquet-vectorization.batch-size");
Review comment:
Options passed into the read here should be short options because they
come from the DataFrameReader interface:
```scala
spark.read.format("iceberg").config("snapshot-id", snapId).load("db.table");
```
The long option names like the ones you have here are what we use for table
properties, which are tracked in `TableProperties` and documented. These
options should default to the table property value, but a DataFrameReader
option should override.
For short names, how about `vectorization-enabled` and `batch-size`? We
should also add constants in `TableProperties` for the properties you have
here. Let's turn off vectorization by default.
Also, we can drop "iceberg" from the start of the properties.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]