pvary commented on a change in pull request #2053:
URL: https://github.com/apache/iceberg/pull/2053#discussion_r553898951
##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -82,10 +82,17 @@ public void initialize(@Nullable Configuration
configuration, Properties serDePr
}
String[] selectedColumns =
ColumnProjectionUtils.getReadColumnNames(configuration);
- Schema projectedSchema = selectedColumns.length > 0 ?
tableSchema.select(selectedColumns) : tableSchema;
+ Schema projectedSchema = tableSchema;
+
+ boolean caseSensitive =
configuration.getBoolean(InputFormatConfig.CASE_SENSITIVE,
Review comment:
I understand that we want to have a possibility to configure the
IcebergInputFormat to be case sensitive or case insensitive since it can be
used by other MR jobs as well. Do we want to allow the users of Hive to shot
themselves on the foot and enable case sensitivity?
My first guess would be that we should not use the configuration here, just
go with `false`, but if you have some specific use-case in your mind I can be
easily convinced 😄
##########
File path:
mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
##########
@@ -35,21 +35,30 @@
public final class IcebergRecordObjectInspector extends StructObjectInspector {
private static final IcebergRecordObjectInspector EMPTY =
- new IcebergRecordObjectInspector(Types.StructType.of(),
Collections.emptyList());
+ new IcebergRecordObjectInspector(Types.StructType.of(),
Collections.emptyList(), true);
private final List<IcebergRecordStructField> structFields;
+ private final List<IcebergRecordStructField> structFieldsInLowercase;
Review comment:
Why do we keep both of the fields? Shouldn't we just keep the currently
requested version?
##########
File path:
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##########
@@ -150,6 +150,27 @@ public void testScanTable() throws IOException {
Assert.assertArrayEquals(new Object[] {"Alice", 0L}, descRows.get(2));
}
+ @Test
Review comment:
I think these queries do not use the execution engine itself, so we do
not have to put these tests to this class.
It is ok to have them only in the TestHiveIcebergStorageHandlerLocalScan.java
##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -80,12 +80,14 @@ public void initialize(@Nullable Configuration
configuration, Properties serDePr
tableSchema = hiveSchemaOrThrow(serDeProperties, e);
}
}
+ configuration.set(InputFormatConfig.CASE_SENSITIVE, "false");
Review comment:
`onfiguration.setBoolean(InputFormatConfig.CASE_SENSITIVE, false);`
##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -80,12 +80,14 @@ public void initialize(@Nullable Configuration
configuration, Properties serDePr
tableSchema = hiveSchemaOrThrow(serDeProperties, e);
}
}
+ configuration.set(InputFormatConfig.CASE_SENSITIVE, "false");
Review comment:
`configuration.setBoolean(InputFormatConfig.CASE_SENSITIVE, false);`
##########
File path:
mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
##########
@@ -48,7 +49,15 @@ public IcebergRecordObjectInspector(Types.StructType
structType, List<ObjectInsp
for (Types.NestedField field : structType.fields()) {
ObjectInspector oi = objectInspectors.get(position);
- IcebergRecordStructField structField = new
IcebergRecordStructField(field, oi, position);
+
+ IcebergRecordStructField structField;
+ if (caseSensitive) {
+ structField = new IcebergRecordStructField(field, oi, position);
+ } else {
+ Types.NestedField fieldInLowercase =
Types.NestedField.of(field.fieldId(), field.isOptional(),
+ field.name().toLowerCase(), field.type(), field.doc());
+ structField = new IcebergRecordStructField(fieldInLowercase, oi,
position);
+ }
Review comment:
nit: After if blocks we tend to leave an extra line
##########
File path:
mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java
##########
@@ -188,9 +189,10 @@ public void initialize(InputSplit split,
TaskAttemptContext newContext) {
this.encryptionManager = ((IcebergSplit) split).encryptionManager();
this.tasks = task.files().iterator();
this.tableSchema = InputFormatConfig.tableSchema(conf);
- this.expectedSchema = readSchema(conf, tableSchema);
+ this.caseSensitive = conf.getBoolean(InputFormatConfig.CASE_SENSITIVE,
+ InputFormatConfig.CASE_SENSITIVE_DEFAULT);
Review comment:
nit: This can be a single line
##########
File path:
mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java
##########
@@ -373,15 +375,18 @@ public void close() throws IOException {
}
}
- private static Schema readSchema(Configuration conf, Schema tableSchema) {
+ private static Schema readSchema(Configuration conf, Schema tableSchema,
boolean caseSensitive) {
Schema readSchema = InputFormatConfig.readSchema(conf);
if (readSchema != null) {
return readSchema;
}
String[] selectedColumns = InputFormatConfig.selectedColumns(conf);
- return selectedColumns != null ? tableSchema.select(selectedColumns) :
tableSchema;
+ if (selectedColumns == null) {
+ return tableSchema;
+ }
+ return caseSensitive ? tableSchema.select(selectedColumns) :
tableSchema.caseInsensitiveSelect(selectedColumns);
Review comment:
nit: Extra line
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]