Re: [PR] [HUDI-8483] Remove unnecessary code [hudi]

via GitHub Mon, 25 Nov 2024 13:00:20 -0800


yihua commented on code in PR #12323:
URL: https://github.com/apache/hudi/pull/12323#discussion_r1857338357



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -443,12 +443,6 @@ private Dataset<Row> 
readRecordsForGroupAsRow(JavaSparkContext jsc,
         .toArray(StoragePath[]::new);
 
     HashMap<String, String> params = new HashMap<>();
-    if (hasLogFiles) {
-      params.put("hoodie.datasource.query.type", "snapshot");
-    } else {
-      params.put("hoodie.datasource.query.type", "read_optimized");
-    }

Review Comment:
   By default, `hoodie.datasource.query.type` is set to `snapshot`, and the new 
`HadoopFSRelation` based reader logic in Spark makes sure there's no 
performance degradation for base file-only cases in MOR, so 
`params.put("hoodie.datasource.query.type", "read_optimized")` should not be 
needed either.  Could you point out what errors are thrown if these lines are 
removed?  It would be good to record and understand the errors to make sure 
there is no other related issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8483] Remove unnecessary code [hudi]

Reply via email to