umehrot2 commented on a change in pull request #956: [HUDI-298] Fix issue with 
incorrect column mapping casusing bad data, during on-the-fly merge of Real 
Time tables
URL: https://github.com/apache/incubator-hudi/pull/956#discussion_r335159042
 
 

 ##########
 File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java
 ##########
 @@ -328,13 +333,40 @@ private void init() throws IOException {
     writerSchema = addPartitionFields(writerSchema, partitioningFields);
     List<String> projectionFields = 
orderFields(jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR),
         jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR), 
partitioningFields);
+
+    Map<String, Field> schemaFieldsMap = getNameToFieldMap(writerSchema);
+    hiveSchema = constructHiveOrderedSchema(writerSchema, schemaFieldsMap);
     // TODO(vc): In the future, the reader schema should be updated based on 
log files & be able
     // to null out fields not present before
-    readerSchema = generateProjectionSchema(writerSchema, projectionFields);
+
+    readerSchema = generateProjectionSchema(writerSchema, schemaFieldsMap, 
projectionFields);
     LOG.info(String.format("About to read compacted logs %s for base split %s, 
projecting cols %s",
         split.getDeltaFilePaths(), split.getPath(), projectionFields));
   }
 
+  private Schema constructHiveOrderedSchema(Schema writerSchema, Map<String, 
Field> schemaFieldsMap) {
+    String hiveColumnString = jobConf.get("columns");
 
 Review comment:
   Sure I can make it a constant with some doc.
   
   `columns` is different because it has names of all the columns in the hive 
table. Whereas `ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR` has only 
those that are required to be read/selected as part of the `query` that is 
being run.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to