[GitHub] [iceberg] pvary commented on a change in pull request #2053: Hive: Support case insensitive in hive query

GitBox Sat, 09 Jan 2021 19:54:25 -0800


pvary commented on a change in pull request #2053:
URL: https://github.com/apache/iceberg/pull/2053#discussion_r553898951




##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -82,10 +82,17 @@ public void initialize(@Nullable Configuration 
configuration, Properties serDePr
     }
 
     String[] selectedColumns = 
ColumnProjectionUtils.getReadColumnNames(configuration);
-    Schema projectedSchema = selectedColumns.length > 0 ? 
tableSchema.select(selectedColumns) : tableSchema;
+    Schema projectedSchema = tableSchema;
+
+    boolean caseSensitive = 
configuration.getBoolean(InputFormatConfig.CASE_SENSITIVE,

Review comment:
       I understand that we want to have a possibility to configure the 
IcebergInputFormat to be case sensitive or case insensitive since it can be 
used by other MR jobs as well. Do we want to allow the users of Hive to shot 
themselves on the foot and enable case sensitivity?
   My first guess would be that we should not use the configuration here, just 
go with `false`, but if you have some specific use-case in your mind I can be 
easily convinced 😄 
   

##########
File path: 
mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
##########
@@ -35,21 +35,30 @@
 public final class IcebergRecordObjectInspector extends StructObjectInspector {
 
   private static final IcebergRecordObjectInspector EMPTY =
-          new IcebergRecordObjectInspector(Types.StructType.of(), 
Collections.emptyList());
+          new IcebergRecordObjectInspector(Types.StructType.of(), 
Collections.emptyList(), true);
 
   private final List<IcebergRecordStructField> structFields;
+  private final List<IcebergRecordStructField> structFieldsInLowercase;

Review comment:
       Why do we keep both of the fields? Shouldn't we just keep the currently 
requested version?

##########
File path: 
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -150,6 +150,27 @@ public void testScanTable() throws IOException {
     Assert.assertArrayEquals(new Object[] {"Alice", 0L}, descRows.get(2));
   }
 
+  @Test

Review comment:
       I think these queries do not use the execution engine itself, so we do 
not have to put these tests to this class.
   It is ok to have them only in the TestHiveIcebergStorageHandlerLocalScan.java

##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -80,12 +80,14 @@ public void initialize(@Nullable Configuration 
configuration, Properties serDePr
         tableSchema = hiveSchemaOrThrow(serDeProperties, e);
       }
     }
+    configuration.set(InputFormatConfig.CASE_SENSITIVE, "false");

Review comment:
       `onfiguration.setBoolean(InputFormatConfig.CASE_SENSITIVE, false);`

##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -80,12 +80,14 @@ public void initialize(@Nullable Configuration 
configuration, Properties serDePr
         tableSchema = hiveSchemaOrThrow(serDeProperties, e);
       }
     }
+    configuration.set(InputFormatConfig.CASE_SENSITIVE, "false");

Review comment:
       `configuration.setBoolean(InputFormatConfig.CASE_SENSITIVE, false);`

##########
File path: 
mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
##########
@@ -48,7 +49,15 @@ public IcebergRecordObjectInspector(Types.StructType 
structType, List<ObjectInsp
 
     for (Types.NestedField field : structType.fields()) {
       ObjectInspector oi = objectInspectors.get(position);
-      IcebergRecordStructField structField = new 
IcebergRecordStructField(field, oi, position);
+
+      IcebergRecordStructField structField;
+      if (caseSensitive) {
+        structField = new IcebergRecordStructField(field, oi, position);
+      } else {
+        Types.NestedField fieldInLowercase = 
Types.NestedField.of(field.fieldId(), field.isOptional(),
+                field.name().toLowerCase(), field.type(), field.doc());
+        structField = new IcebergRecordStructField(fieldInLowercase, oi, 
position);
+      }

Review comment:
       nit: After if blocks we tend to leave an extra line

##########
File path: 
mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java
##########
@@ -188,9 +189,10 @@ public void initialize(InputSplit split, 
TaskAttemptContext newContext) {
       this.encryptionManager = ((IcebergSplit) split).encryptionManager();
       this.tasks = task.files().iterator();
       this.tableSchema = InputFormatConfig.tableSchema(conf);
-      this.expectedSchema = readSchema(conf, tableSchema);
+      this.caseSensitive = conf.getBoolean(InputFormatConfig.CASE_SENSITIVE,
+              InputFormatConfig.CASE_SENSITIVE_DEFAULT);

Review comment:
       nit: This can be a single line

##########
File path: 
mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java
##########
@@ -373,15 +375,18 @@ public void close() throws IOException {
       }
     }
 
-    private static Schema readSchema(Configuration conf, Schema tableSchema) {
+    private static Schema readSchema(Configuration conf, Schema tableSchema, 
boolean caseSensitive) {
       Schema readSchema = InputFormatConfig.readSchema(conf);
 
       if (readSchema != null) {
         return readSchema;
       }
 
       String[] selectedColumns = InputFormatConfig.selectedColumns(conf);
-      return selectedColumns != null ? tableSchema.select(selectedColumns) : 
tableSchema;
+      if (selectedColumns == null) {
+        return tableSchema;
+      }
+      return caseSensitive ? tableSchema.select(selectedColumns) : 
tableSchema.caseInsensitiveSelect(selectedColumns);

Review comment:
       nit: Extra line




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pvary commented on a change in pull request #2053: Hive: Support case insensitive in hive query

Reply via email to