umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default 
values of fields if not present when rewriting incoming record with new schema

 File path: 
 @@ -57,4 +60,16 @@ public void testPropsPresent() {
     Assert.assertTrue("column pii_col doesn't show up", piiPresent);
+  @Test
+  public void testDefaultValue() {
+    GenericRecord rec = new GenericData.Record(new 
+    rec.put("_row_key", "key1");
+    rec.put("non_pii_col", "val1");
+    rec.put("pii_col", "val2");
+    rec.put("timestamp", 3.5);
 Review comment:
   My bad I was thinking only from `DataSource's HoodieSparkSqlWriter` writer 
point of view, where the schema is determined automatically from the 
`DataFrame` and converted to avro schema. Missed that `DeltaStreamer` uses the 
`schema provider` which the users can pass it directly to the 
`HoodieWriteClient`. Thanks for details !
   I have a question for the schema evolution example you provided. The 
`rewriteRecord()` you are testing here uses the schema from the old record, and 
re-writes by setting only the fields found in the old schema. So if you rewrite 
R1 and R2 record, there schema will not have the new `col1` field right ? 
Hence, your code of populating default values will not get executed because 
`col1` is not present in the old schema fields.
   It seems this test case works because you are not evolving the schema here. 
Your old and new record both have the same schema. But if your old record 
schema is different I think you will run into the same issue. Am I missing 
something here ?

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

Reply via email to