nsivabalan commented on a change in pull request #4446:
URL: https://github.com/apache/hudi/pull/4446#discussion_r802192950
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##########
@@ -70,9 +71,13 @@
*/
private int totalBuckets = 0;
/**
- * Stat for the current workload. Helps in determining inserts, upserts etc.
+ * Stat for the input workload. Describe the workload before being assigned
buckets.
*/
- private WorkloadProfile profile;
+ private WorkloadProfile inputProfile;
+ /**
+ * Stat for the execution workload. Describe the workload after being
assigned buckets.
+ */
+ private HashMap<String, WorkloadStat> executionProfile = new HashMap<>();
Review comment:
as suggested above, can we try to move this within WorkloadProfile only.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -182,14 +182,28 @@ public abstract void preCompact(
.withOperationField(config.allowOperationMetadataField())
.withPartition(operation.getPartitionPath())
.build();
- if (!scanner.iterator().hasNext()) {
- scanner.close();
- return new ArrayList<>();
- }
Option<HoodieBaseFile> oldDataFileOpt =
operation.getBaseFile(metaClient.getBasePath(),
operation.getPartitionPath());
+ // Considering following scenario: if all log blocks in this fileSlice is
rollback, it returns an empty scanner.
+ // But in this case, we need to give it a base file. Otherwise, it will
lose base file in following fileSlice.
+ if (!scanner.iterator().hasNext()) {
+ if (!oldDataFileOpt.isPresent()) {
+ scanner.close();
+ return new ArrayList<>();
+ } else {
+ // TODO: we may directly rename original parquet file if there is not
evolution/devolution of schema
Review comment:
yes. it's not feasible to know upfront if all log blocks are invalid
while planning compaction.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/Partitioner.java
##########
@@ -18,10 +18,14 @@
package org.apache.hudi.table.action.commit;
+import org.apache.hudi.table.WorkloadProfile;
+
import java.io.Serializable;
public interface Partitioner extends Serializable {
int getNumPartitions();
int getPartition(Object key);
+
+ WorkloadProfile getExecutionWorkloadProfile();
}
Review comment:
wondering if we can augment existing WorkloadProfile only to add another
instance of HashMap<String, WorkloadStat> partitionPathStatMap.
```
/**
* Input workload profile.
*/
protected final HashMap<String, WorkloadStat> inputPartitionPathStatMap;
/**
* Execution workload profile.
*/
protected final HashMap<String, WorkloadStat> outputPartitionPathStatMap;
```
Let the UpsertPartitioner update the execution profile (a.k.a
outputPartitionPathStatMap ) in the passed in workload profile itself.
And so CommitActionExecutors does not need to poll partitionor for execution
profile since they already have an handle to WorkloadProfile.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -182,14 +182,28 @@ public abstract void preCompact(
.withOperationField(config.allowOperationMetadataField())
.withPartition(operation.getPartitionPath())
.build();
- if (!scanner.iterator().hasNext()) {
- scanner.close();
- return new ArrayList<>();
- }
Option<HoodieBaseFile> oldDataFileOpt =
operation.getBaseFile(metaClient.getBasePath(),
operation.getPartitionPath());
+ // Considering following scenario: if all log blocks in this fileSlice is
rollback, it returns an empty scanner.
+ // But in this case, we need to give it a base file. Otherwise, it will
lose base file in following fileSlice.
+ if (!scanner.iterator().hasNext()) {
+ if (!oldDataFileOpt.isPresent()) {
+ scanner.close();
+ return new ArrayList<>();
+ } else {
+ // TODO: we may directly rename original parquet file if there is not
evolution/devolution of schema
Review comment:
yes, your understanding is right.
what i meant by "do these need to be cleaned up " is, why have commented out
code from L197 to L203? do you want to remove them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]