[GitHub] [hudi] vinothchandar commented on a change in pull request #2322: [HUDI-1437] support more accurate spark JobGroup for better performan tracking

GitBox Mon, 14 Dec 2020 23:05:24 -0800


vinothchandar commented on a change in pull request #2322:
URL: https://github.com/apache/hudi/pull/2322#discussion_r543092948




##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
##########
@@ -403,6 +403,7 @@ public void finalizeWrite(HoodieEngineContext context, 
String instantTs, List<Ho
 
   private void deleteInvalidFilesByPartitions(HoodieEngineContext context, 
Map<String, List<Pair<String, String>>> invalidFilesByPartition) {
     // Now delete partially written files
+    context.setJobStatus(this.getClass().getSimpleName(), "Delete invalid 
files by partitions");

Review comment:
       change message to : "Delete invalid files generated during the write 
operation" ? 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/MarkerFiles.java
##########
@@ -135,6 +135,7 @@ public boolean doesMarkerDirExist() throws IOException {
     if (subDirectories.size() > 0) {
       parallelism = Math.min(subDirectories.size(), parallelism);
       SerializableConfiguration serializedConf = new 
SerializableConfiguration(fs.getConf());
+      context.setJobStatus(this.getClass().getSimpleName(), "MarkerFiles 
created and merged data paths");

Review comment:
       change to : `Obtaining marker files for all created, merged paths`

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bloom/SparkHoodieBloomIndex.java
##########
@@ -137,12 +137,14 @@ public SparkHoodieBloomIndex(HoodieWriteConfig config) {
    */
   private Map<String, Long> computeComparisonsPerFileGroup(final Map<String, 
Long> recordsPerPartition,
                                                            final Map<String, 
List<BloomIndexFileInfo>> partitionToFileInfo,
-                                                           JavaPairRDD<String, 
String> partitionRecordKeyPairRDD) {
+                                                           JavaPairRDD<String, 
String> partitionRecordKeyPairRDD,
+                                                           final 
HoodieEngineContext context) {
 
     Map<String, Long> fileToComparisons;
     if (config.getBloomIndexPruneByRanges()) {
       // we will just try exploding the input and then count to determine 
comparisons
       // FIX(vc): Only do sampling here and extrapolate?
+      context.setJobStatus(this.getClass().getSimpleName(), "Explode recordRDD 
with file comparisons");

Review comment:
       Change to : `Compute all comparisons needed between records and files`

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java
##########
@@ -101,6 +101,7 @@ public BaseSparkCommitActionExecutor(HoodieEngineContext 
context,
 
     WorkloadProfile profile = null;
     if (isWorkloadProfileNeeded()) {
+      context.setJobStatus(this.getClass().getSimpleName(), "Build workload 
profile");

Review comment:
       Change to : `Building workload profile` 

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/SparkMarkerBasedRollbackStrategy.java
##########
@@ -52,6 +52,7 @@ public SparkMarkerBasedRollbackStrategy(HoodieTable<T, 
JavaRDD<HoodieRecord<T>>,
       MarkerFiles markerFiles = new MarkerFiles(table, 
instantToRollback.getTimestamp());
       List<String> markerFilePaths = markerFiles.allMarkerFilePaths();
       int parallelism = Math.max(Math.min(markerFilePaths.size(), 
config.getRollbackParallelism()), 1);
+      jsc.setJobGroup(this.getClass().getSimpleName(), "Marker files 
rollback");

Review comment:
       Change to:  `Rolling back using marker files` 

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/compact/SparkRunCompactionActionExecutor.java
##########
@@ -76,6 +76,7 @@ public 
SparkRunCompactionActionExecutor(HoodieSparkEngineContext context,
       JavaRDD<WriteStatus> statuses = compactor.compact(context, 
compactionPlan, table, config, instantTime);
 
       
statuses.persist(SparkMemoryUtils.getWriteStatusStorageLevel(config.getProps()));
+      context.setJobStatus(this.getClass().getSimpleName(), "Collect 
compaction metadata status");

Review comment:
       Change to : `Preparing compaction metadata` 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] vinothchandar commented on a change in pull request #2322: [HUDI-1437] support more accurate spark JobGroup for better performan tracking

Reply via email to