Copilot commented on code in PR #1781:
URL: https://github.com/apache/auron/pull/1781#discussion_r2643189002
##########
spark-extension/src/main/java/org/apache/auron/spark/configuration/SparkAuronConfiguration.java:
##########
@@ -237,6 +238,11 @@ public class SparkAuronConfiguration extends
AuronConfiguration {
.description("use microsecond precision when reading ORC timestamp
columns. ")
.booleanType()
.defaultValue(false);
+ public static final ConfigOption<Boolean> ORC_SCHEMA_ISCASE_SENSITIVE =
ConfigOptions.key(
+ "auron.orc.schema.iscasesensitive")
+ .description("does matching ORC file schema distinguish between
uppercase and lowercase ")
+ .booleanType()
+ .defaultValue(false);
Review Comment:
The default value for ORC_SCHEMA_ISCASE_SENSITIVE is inconsistent between
configuration files. In AuronConf.java line 142, it's set to true, but here
it's set to false. This inconsistency could lead to unexpected behavior
depending on which configuration source is used. The default values should
match across both configuration classes.
```suggestion
.defaultValue(true);
```
##########
native-engine/datafusion-ext-plans/src/orc_exec.rs:
##########
@@ -170,6 +171,7 @@ impl ExecutionPlan for OrcExec {
metrics: self.metrics.clone(),
force_positional_evolution,
use_microsecond_precision,
+ is_case_sensitive
Review Comment:
Missing comma after the is_case_sensitive field in the struct
initialization. In Rust, struct field initializers must be separated by commas.
This will cause a compilation error.
##########
spark-extension/src/main/java/org/apache/spark/sql/auron/AuronConf.java:
##########
@@ -139,6 +139,8 @@ public enum AuronConf {
// use microsecond precision when reading ORC timestamp columns
ORC_TIMESTAMP_USE_MICROSECOND("spark.auron.orc.timestamp.use.microsecond",
false),
+ ORC_SCHEMA_ISCASE_SENSITIVE("spark.auron.orc.schema.iscasesensitive",
true),
Review Comment:
The default value of true for ORC_SCHEMA_ISCASE_SENSITIVE appears
inconsistent with the PR's purpose of fixing Hive ORC file compatibility.
According to the PR description, this change is meant to "fix the issue of
reading null ORC files written by Hive" with uppercase fields. Setting
case-sensitive matching as the default (true) would prevent matching fields
with different cases, which contradicts the goal of this fix. The default
should likely be false to enable case-insensitive matching for Hive
compatibility, as configured in SparkAuronConfiguration.java line 245.
```suggestion
ORC_SCHEMA_ISCASE_SENSITIVE("spark.auron.orc.schema.iscasesensitive",
false),
```
##########
native-engine/datafusion-ext-plans/src/orc_exec.rs:
##########
@@ -364,14 +369,30 @@ impl SchemaAdapter {
}
}
} else {
- for named_column in file_named_columns {
- if let Some((proj_idx, _)) =
+ if is_case_sensitive {
+ for named_column in file_named_columns {
+ if let Some((proj_idx, _)) =
self.projected_schema.fields().find(named_column.name())
- {
- field_mappings[proj_idx] = Some(projection.len());
- projection.push(named_column.data_type().column_index());
+ {
+ field_mappings[proj_idx] = Some(projection.len());
+
projection.push(named_column.data_type().column_index());
+ }
+ }
+ } else {
+ for named_column in file_named_columns {
+ // Case-insensitive field name matching
+ let named_column_name_lower =
named_column.name().to_lowercase();
+ if let Some((proj_idx, _)) =
+ self.projected_schema.fields()
+ .iter()
+ .enumerate()
+ .find(|(_, f)| f.name().to_lowercase() ==
named_column_name_lower)
+ {
+ field_mappings[proj_idx] = Some(projection.len());
+
projection.push(named_column.data_type().column_index());
+ }
+ }
}
Review Comment:
Inconsistent indentation in the case-insensitive branch. The for loop at
line 382, the if-let statement at line 385, and the closing braces at lines
393-394 have incorrect indentation levels. They should be properly indented to
match the surrounding code structure and maintain consistency with the
case-sensitive branch above.
##########
spark-extension/src/main/java/org/apache/auron/spark/configuration/SparkAuronConfiguration.java:
##########
@@ -27,6 +27,7 @@
import org.apache.spark.internal.config.ConfigEntryWithDefault;
import scala.Option;
import scala.collection.immutable.List$;
+import sun.security.pkcs11.P11Util;
Review Comment:
This import statement appears to be unused and unrelated to the changes in
this PR. The P11Util class is from the internal sun.security.pkcs11 package and
is not referenced anywhere in this file. This import should be removed.
```suggestion
```
##########
native-engine/datafusion-ext-plans/src/orc_exec.rs:
##########
@@ -364,14 +369,30 @@ impl SchemaAdapter {
}
}
} else {
- for named_column in file_named_columns {
- if let Some((proj_idx, _)) =
+ if is_case_sensitive {
+ for named_column in file_named_columns {
+ if let Some((proj_idx, _)) =
self.projected_schema.fields().find(named_column.name())
- {
- field_mappings[proj_idx] = Some(projection.len());
- projection.push(named_column.data_type().column_index());
+ {
+ field_mappings[proj_idx] = Some(projection.len());
+
projection.push(named_column.data_type().column_index());
+ }
+ }
Review Comment:
Inconsistent indentation in the case-sensitive branch. The if-let statement
starting at line 374 and the closing brace at line 380 have incorrect
indentation levels. They should align with the for loop at line 373 to maintain
consistent code formatting throughout the function.
##########
native-engine/datafusion-ext-plans/src/orc_exec.rs:
##########
@@ -217,6 +219,7 @@ struct OrcOpener {
metrics: ExecutionPlanMetricsSet,
force_positional_evolution: bool,
use_microsecond_precision: bool,
+ is_case_sensitive: bool
Review Comment:
Missing comma after the is_case_sensitive field. In Rust, struct field
definitions must be separated by commas. This will cause a compilation error.
```suggestion
is_case_sensitive: bool,
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]