[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

GitBox Mon, 25 Apr 2022 17:21:47 -0700


alexeykudinkin commented on code in PR #5427:
URL: https://github.com/apache/hudi/pull/5427#discussion_r858121011



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##########
@@ -324,7 +326,14 @@ object HoodieSparkUtils extends SparkAdapterSupport {
       val name2Fields = tableAvroSchema.getFields.asScala.map(f => f.name() -> 
f).toMap
       // Here have to create a new Schema.Field object
       // to prevent throwing exceptions like 
"org.apache.avro.AvroRuntimeException: Field already used".
-      val requiredFields = requiredColumns.map(c => name2Fields(c))
+      val requiredFields = requiredColumns.filter(c => {

Review Comment:
   We should not relax this here actually, b/c `requiredColumns` will contain 
also query columns.
   
   Instead we should provide `HoodieMergeOnReadRDD` 2 parquet readers: 
   
   1. Primed for merging (ie for schema containing record-key, precombine-key)
   2. Primed for NO merging (ie whose schema could be essentially empty)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5427: [HUDI-3974] Fix schema projection to skip non-existent preCombine field

Reply via email to