Re: [PR] Fix the ORC table written by Hive to read null uppercase fields [auron]

via GitHub Tue, 23 Dec 2025 05:28:08 -0800


Copilot commented on code in PR #1781:
URL: https://github.com/apache/auron/pull/1781#discussion_r2643189002



##########
spark-extension/src/main/java/org/apache/auron/spark/configuration/SparkAuronConfiguration.java:
##########
@@ -237,6 +238,11 @@ public class SparkAuronConfiguration extends 
AuronConfiguration {
             .description("use microsecond precision when reading ORC timestamp 
columns. ")
             .booleanType()
             .defaultValue(false);
+    public static final ConfigOption<Boolean> ORC_SCHEMA_ISCASE_SENSITIVE = 
ConfigOptions.key(
+                    "auron.orc.schema.iscasesensitive")
+            .description("does matching ORC file schema distinguish between 
uppercase and lowercase ")
+            .booleanType()
+            .defaultValue(false);

Review Comment:
   The default value for ORC_SCHEMA_ISCASE_SENSITIVE is inconsistent between 
configuration files. In AuronConf.java line 142, it's set to true, but here 
it's set to false. This inconsistency could lead to unexpected behavior 
depending on which configuration source is used. The default values should 
match across both configuration classes.
   ```suggestion
               .defaultValue(true);
   ```



##########
native-engine/datafusion-ext-plans/src/orc_exec.rs:
##########
@@ -170,6 +171,7 @@ impl ExecutionPlan for OrcExec {
             metrics: self.metrics.clone(),
             force_positional_evolution,
             use_microsecond_precision,
+            is_case_sensitive

Review Comment:
   Missing comma after the is_case_sensitive field in the struct 
initialization. In Rust, struct field initializers must be separated by commas. 
This will cause a compilation error.



##########
spark-extension/src/main/java/org/apache/spark/sql/auron/AuronConf.java:
##########
@@ -139,6 +139,8 @@ public enum AuronConf {
     // use microsecond precision when reading ORC timestamp columns
     ORC_TIMESTAMP_USE_MICROSECOND("spark.auron.orc.timestamp.use.microsecond", 
false),
 
+    ORC_SCHEMA_ISCASE_SENSITIVE("spark.auron.orc.schema.iscasesensitive", 
true),

Review Comment:
   The default value of true for ORC_SCHEMA_ISCASE_SENSITIVE appears 
inconsistent with the PR's purpose of fixing Hive ORC file compatibility. 
According to the PR description, this change is meant to "fix the issue of 
reading null ORC files written by Hive" with uppercase fields. Setting 
case-sensitive matching as the default (true) would prevent matching fields 
with different cases, which contradicts the goal of this fix. The default 
should likely be false to enable case-insensitive matching for Hive 
compatibility, as configured in SparkAuronConfiguration.java line 245.
   ```suggestion
       ORC_SCHEMA_ISCASE_SENSITIVE("spark.auron.orc.schema.iscasesensitive", 
false),
   ```



##########
native-engine/datafusion-ext-plans/src/orc_exec.rs:
##########
@@ -364,14 +369,30 @@ impl SchemaAdapter {
                 }
             }
         } else {
-            for named_column in file_named_columns {
-                if let Some((proj_idx, _)) =
+            if is_case_sensitive {
+                for named_column in file_named_columns {
+                    if let Some((proj_idx, _)) =
                     self.projected_schema.fields().find(named_column.name())
-                {
-                    field_mappings[proj_idx] = Some(projection.len());
-                    projection.push(named_column.data_type().column_index());
+                    {
+                        field_mappings[proj_idx] = Some(projection.len());
+                        
projection.push(named_column.data_type().column_index());
+                        }
+                    }
+                } else {
+                    for named_column in file_named_columns {
+                    // Case-insensitive field name matching
+                    let named_column_name_lower = 
named_column.name().to_lowercase();
+                    if let Some((proj_idx, _)) =
+                    self.projected_schema.fields()
+                    .iter()
+                    .enumerate()
+                    .find(|(_, f)| f.name().to_lowercase() == 
named_column_name_lower)
+                    {
+                        field_mappings[proj_idx] = Some(projection.len());
+                        
projection.push(named_column.data_type().column_index());
+                        }
+                    }
                 }

Review Comment:
   Inconsistent indentation in the case-insensitive branch. The for loop at 
line 382, the if-let statement at line 385, and the closing braces at lines 
393-394 have incorrect indentation levels. They should be properly indented to 
match the surrounding code structure and maintain consistency with the 
case-sensitive branch above.



##########
spark-extension/src/main/java/org/apache/auron/spark/configuration/SparkAuronConfiguration.java:
##########
@@ -27,6 +27,7 @@
 import org.apache.spark.internal.config.ConfigEntryWithDefault;
 import scala.Option;
 import scala.collection.immutable.List$;
+import sun.security.pkcs11.P11Util;

Review Comment:
   This import statement appears to be unused and unrelated to the changes in 
this PR. The P11Util class is from the internal sun.security.pkcs11 package and 
is not referenced anywhere in this file. This import should be removed.
   ```suggestion
   
   ```



##########
native-engine/datafusion-ext-plans/src/orc_exec.rs:
##########
@@ -364,14 +369,30 @@ impl SchemaAdapter {
                 }
             }
         } else {
-            for named_column in file_named_columns {
-                if let Some((proj_idx, _)) =
+            if is_case_sensitive {
+                for named_column in file_named_columns {
+                    if let Some((proj_idx, _)) =
                     self.projected_schema.fields().find(named_column.name())
-                {
-                    field_mappings[proj_idx] = Some(projection.len());
-                    projection.push(named_column.data_type().column_index());
+                    {
+                        field_mappings[proj_idx] = Some(projection.len());
+                        
projection.push(named_column.data_type().column_index());
+                        }
+                    }

Review Comment:
   Inconsistent indentation in the case-sensitive branch. The if-let statement 
starting at line 374 and the closing brace at line 380 have incorrect 
indentation levels. They should align with the for loop at line 373 to maintain 
consistent code formatting throughout the function.



##########
native-engine/datafusion-ext-plans/src/orc_exec.rs:
##########
@@ -217,6 +219,7 @@ struct OrcOpener {
     metrics: ExecutionPlanMetricsSet,
     force_positional_evolution: bool,
     use_microsecond_precision: bool,
+    is_case_sensitive: bool

Review Comment:
   Missing comma after the is_case_sensitive field. In Rust, struct field 
definitions must be separated by commas. This will cause a compilation error.
   ```suggestion
       is_case_sensitive: bool,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix the ORC table written by Hive to read null uppercase fields [auron]

Reply via email to