[ 
https://issues.apache.org/jira/browse/HUDI-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382452#comment-17382452
 ] 

ASF GitHub Bot commented on HUDI-1860:
--------------------------------------

nsivabalan commented on a change in pull request #3184:
URL: https://github.com/apache/hudi/pull/3184#discussion_r671607820



##########
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -475,8 +485,8 @@ public void refreshTimeline() throws IOException {
         LOG.warn("Some records failed to be merged but forcing commit since 
commitOnErrors set. Errors/Total="
             + totalErrorRecords + "/" + totalRecords);
       }
-
-      boolean success = writeClient.commit(instantTime, writeStatusRDD, 
Option.of(checkpointCommitMetadata));
+      String commitActionType = CommitUtils.getCommitActionType(cfg.operation, 
HoodieTableType.valueOf(cfg.tableType));

Review comment:
       I meant to fix it in this PR itself. just one line java docs to each 
method. @codope : Can you coordinate w/ samrat and fix the docs.

##########
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -1695,6 +1695,54 @@ public void 
testJdbcSourceIncrementalFetchInContinuousMode() {
     }
   }
 
+  @Test
+  public void testInsertOverwrite() throws Exception {
+    String tableBasePath = dfsBasePath + "/insert_overwrite";
+    // Initial insert
+    HoodieDeltaStreamer.Config cfg = TestHelpers.makeConfig(tableBasePath, 
WriteOperationType.INSERT);
+    new HoodieDeltaStreamer(cfg, jsc).sync();
+    TestHelpers.assertRecordCount(1000, tableBasePath + "/*/*.parquet", 
sqlContext);
+    TestHelpers.assertDistanceCount(1000, tableBasePath + "/*/*.parquet", 
sqlContext);
+    TestHelpers.assertCommitMetadata("00000", tableBasePath, dfs, 1);
+    // No new data => no commits.
+    cfg.sourceLimit = 0;
+    new HoodieDeltaStreamer(cfg, jsc).sync();
+    TestHelpers.assertRecordCount(1000, tableBasePath + "/*/*.parquet", 
sqlContext);
+    TestHelpers.assertDistanceCount(1000, tableBasePath + "/*/*.parquet", 
sqlContext);
+    TestHelpers.assertCommitMetadata("00000", tableBasePath, dfs, 1);
+    // insert overwrite
+    cfg.sourceLimit = 1000;
+    cfg.operation = WriteOperationType.INSERT_OVERWRITE;

Review comment:
       :) I get it. Guess that test also needs some fixing. 
   Basically we wanna verify that insert_overwrite does not overwrite 
mismatched partitions. So, better to cover it as part of tests. If you feel, it 
might take lot of time to get this in, its ok. Do file a ticket. One of us from 
the community will follow up. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Add INSERT_OVERWRITE support to DeltaStreamer
> ---------------------------------------------
>
>                 Key: HUDI-1860
>                 URL: https://issues.apache.org/jira/browse/HUDI-1860
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Sagar Sumit
>            Assignee: Samrat Deb
>            Priority: Major
>              Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As discussed in [this 
> RFC|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller],
>  having full fetch mode use the inser_overwrite to write to sync would be 
> better as it can handle schema changes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to