vinothchandar commented on a change in pull request #1289: [HUDI-92] Provide
reasonable names for Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#discussion_r375667190
##########
File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
##########
@@ -586,6 +586,7 @@ public boolean savepoint(String commitTime, String user,
String comment) {
HoodieTimeline.compareTimestamps(commitTime, lastCommitRetained,
HoodieTimeline.GREATER_OR_EQUAL),
"Could not savepoint commit " + commitTime + " as this is beyond the
lookup window " + lastCommitRetained);
+ jsc.setJobGroup(this.getClass().getSimpleName(), "Collecting latest
files in partition");
Review comment:
In general lets provide some context into what higher level context, the
action is being performed i.e savepoints, compaction, rollbacks. etc . In that
spirit, change to `Collecting latest files for savepoint` ?
Also wonder if we can include the `commitTime` in the detail i.e `Collecting
latest files for savepoint 20200205010000`. This way, you can just go to past
runs on spark history server and relate them to commits on hudi.. Even better,
if someone is running deltastreamer in continuous mode, then they can see
activity for commits over time
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services