aokolnychyi commented on a change in pull request #2248:
URL: https://github.com/apache/iceberg/pull/2248#discussion_r578036747
##########
File path: spark3/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -474,15 +475,28 @@ public static boolean isLocalityEnabled(FileIO io, String
location, CaseInsensit
return false;
}
- public static boolean isVectorizationEnabled(Map<String, String> properties,
CaseInsensitiveStringMap readOptions) {
+ public static boolean isVectorizationEnabled(FileFormat fileFormat,
+ Map<String, String> properties,
+ CaseInsensitiveStringMap
readOptions) {
String batchReadsSessionConf = SparkSession.active().conf()
.get("spark.sql.iceberg.vectorization.enabled", null);
if (batchReadsSessionConf != null) {
return Boolean.valueOf(batchReadsSessionConf);
}
- return readOptions.getBoolean(SparkReadOptions.VECTORIZATION_ENABLED,
- PropertyUtil.propertyAsBoolean(properties,
- TableProperties.PARQUET_VECTORIZATION_ENABLED,
TableProperties.PARQUET_VECTORIZATION_ENABLED_DEFAULT));
+
+ switch (fileFormat) {
+ case PARQUET:
+ boolean defaultValue = PropertyUtil.propertyAsBoolean(
+ properties,
+ TableProperties.PARQUET_VECTORIZATION_ENABLED,
+ TableProperties.PARQUET_VECTORIZATION_ENABLED_DEFAULT);
+ return readOptions.getBoolean(SparkReadOptions.VECTORIZATION_ENABLED,
defaultValue);
+ case ORC:
+ // TODO: support a table property to enable/disable vectorized reads
in ORC
+ return readOptions.getBoolean(SparkReadOptions.VECTORIZATION_ENABLED,
true);
Review comment:
I think it would be handy to add more ORC properties at the table level
to control various aspects like we control the row group size in Parquet, for
example. It is quite flexible and allows us to change it in one place and pick
it up in all jobs.
I agree about disabling this by default. Let me actually submit a separate
PR for the table property and then consume it here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]