umehrot2 commented on a change in pull request #956: [HUDI-298] Fix issue with
incorrect column mapping casusing bad data, during on-the-fly merge of Real
Time tables
URL: https://github.com/apache/incubator-hudi/pull/956#discussion_r335159042
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java
##########
@@ -328,13 +333,40 @@ private void init() throws IOException {
writerSchema = addPartitionFields(writerSchema, partitioningFields);
List<String> projectionFields =
orderFields(jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR),
jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR),
partitioningFields);
+
+ Map<String, Field> schemaFieldsMap = getNameToFieldMap(writerSchema);
+ hiveSchema = constructHiveOrderedSchema(writerSchema, schemaFieldsMap);
// TODO(vc): In the future, the reader schema should be updated based on
log files & be able
// to null out fields not present before
- readerSchema = generateProjectionSchema(writerSchema, projectionFields);
+
+ readerSchema = generateProjectionSchema(writerSchema, schemaFieldsMap,
projectionFields);
LOG.info(String.format("About to read compacted logs %s for base split %s,
projecting cols %s",
split.getDeltaFilePaths(), split.getPath(), projectionFields));
}
+ private Schema constructHiveOrderedSchema(Schema writerSchema, Map<String,
Field> schemaFieldsMap) {
+ String hiveColumnString = jobConf.get("columns");
Review comment:
Sure I can make it a constant with some doc.
`columns` is different because it has names of all the columns in the hive
table. Whereas `ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR` has only
those that are required to be read/selected as part of the `query` that is
being run.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services