[ 
https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216728#comment-17216728
 ] 

liwei commented on HUDI-303:
----------------------------

i do not think this should fix. because hive meta column is case insensitive. 
if do not *lowercase  will not match the hive meta schema with avro schema. 
just like :  hive_metastoreConstants.META_TABLE_COLUMNS will be case 
insensitive.* 

Map<String, Field> schemaFieldsMap = 
HoodieRealtimeRecordReaderUtils.getNameToFieldMap(writerSchema);
hiveSchema = constructHiveOrderedSchema(writerSchema, schemaFieldsMap);

// Get all column names of hive table
String hiveColumnString = 
jobConf.get(hive_metastoreConstants.META_TABLE_COLUMNS);
LOG.info("Hive Columns : " + hiveColumnString);
String[] hiveColumns = hiveColumnString.split(",");
LOG.info("Hive Columns : " + hiveColumnString);
List<Field> hiveSchemaFields = new ArrayList<>();

for (String columnName : hiveColumns) {
 Field field = schemaFieldsMap.get(columnName.toLowerCase());

 if (field != null) {
 hiveSchemaFields.add(new Schema.Field(field.name(), field.schema(), 
field.doc(), field.defaultVal()));
 } else {
 // Hive has some extra virtual columns like BLOCK__OFFSET__INSIDE__FILE which 
do not exist in table schema.
 // They will get skipped as they won't be found in the original schema.
 LOG.debug("Skipping Hive Column => " + columnName);
 }
}

> Avro schema case sensitivity testing
> ------------------------------------
>
>                 Key: HUDI-303
>                 URL: https://issues.apache.org/jira/browse/HUDI-303
>             Project: Apache Hudi
>          Issue Type: Test
>          Components: Spark Integration
>            Reporter: Udit Mehrotra
>            Assignee: Udit Mehrotra
>            Priority: Minor
>              Labels: bug-bash-0.6.0
>
> As a fallout of [PR 956|https://github.com/apache/incubator-hudi/pull/956] we 
> would like to understand how Avro behaves with case sensitive column names.
> Couple of action items:
>  * Test with different field names just differing in case.
>  * *AbstractRealtimeRecordReader* is one of the classes where we are 
> converting Avro Schema field names to lower case, to be able to verify them 
> against column names from Hive. We can consider removing the *lowercase* 
> conversion there if we verify it does not break anything.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to