nsivabalan commented on code in PR #5294:
URL: https://github.com/apache/hudi/pull/5294#discussion_r847880037


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -133,6 +133,11 @@ public class HoodieWriteConfig extends HoodieConfig {
       .withDocumentation("Easily configure one the built-in key generators, 
instead of specifying the key generator class."
           + "Currently supports SIMPLE, COMPLEX, TIMESTAMP, CUSTOM, 
NON_PARTITION, GLOBAL_DELETE");
 
+  public static final ConfigProperty<Boolean> DROP_PARTITION_COLUMNS = 
ConfigProperty

Review Comment:
   already we have a table config 
hoodie.datasource.write.drop.partition.columns. can't we re-use that ?



##########
hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java:
##########
@@ -227,6 +228,20 @@ public void testAddingAndRemovingMetadataFields() {
     assertEquals(NUM_FIELDS_IN_EXAMPLE_SCHEMA, 
schemaWithoutMetaCols.getFields().size());
   }
 
+  @Test
+  public void testRemoveFields() {
+    GenericRecord rec = new GenericData.Record(new 
Schema.Parser().parse(EXAMPLE_SCHEMA));
+    rec.put("_row_key", "key1");
+    rec.put("non_pii_col", "val1");
+    rec.put("pii_col", "val2");
+    rec.put("timestamp", 3.5);
+    GenericRecord rec1 = HoodieAvroUtils.removeFields(rec, 
Arrays.asList("pii_col"));
+    assertEquals("key1", rec1.get("_row_key"));
+    assertEquals("val1", rec1.get("non_pii_col"));
+    assertEquals(3.5, rec1.get("timestamp"));
+    assertNull(rec1.get("pii_col"));

Review Comment:
   can we assert the schema of rec1 as well. essentially other fields should 
not have been moved around. I mean, fields having positions > the removed field 
will have -1 in their position. but otherwise. 



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java:
##########
@@ -727,6 +734,13 @@ public void setupWriteClient() throws IOException {
 
   private void reInitWriteClient(Schema sourceSchema, Schema targetSchema) 
throws IOException {
     LOG.info("Setting up new Hoodie Write Client");
+    final boolean isDropPartitionColumns = 
props.getBoolean(DROP_PARTITION_COLUMNS.key());
+    if (isDropPartitionColumns) {
+      String partitionColumns = 
HoodieSparkUtils.getPartitionColumns(keyGenerator, props);

Review Comment:
   we already get these once at L 480 ish. is it possible to re-use them. 



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java:
##########
@@ -477,14 +479,19 @@ private Pair<SchemaProvider, Pair<String, 
JavaRDD<HoodieRecord>>> fetchFromSourc
     }
 
     boolean shouldCombine = cfg.filterDupes || 
cfg.operation.equals(WriteOperationType.UPSERT);
+    boolean isDropPartitionColumns = 
props.getBoolean(DROP_PARTITION_COLUMNS.key());
+    String partitionColumns = 
HoodieSparkUtils.getPartitionColumns(keyGenerator, props);
+    List<String> listOfPartitionColumns = 
Arrays.asList(partitionColumns.split(","));
     JavaRDD<GenericRecord> avroRDD = avroRDDOptional.get();
+    HoodieKey key = keyGenerator.getKey(avroRDD.first());

Review Comment:
   not sure if this is right. we have to move this within map (). for each 
genRec, we have to first run through key generator and then modify the genRec 
if need be (drop partition cols). 
   Also, can we move lines 482 to 494 to separate method and keep this method 
cleaner/leaner.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to