yihua commented on code in PR #12323:
URL: https://github.com/apache/hudi/pull/12323#discussion_r1857338357
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -443,12 +443,6 @@ private Dataset<Row>
readRecordsForGroupAsRow(JavaSparkContext jsc,
.toArray(StoragePath[]::new);
HashMap<String, String> params = new HashMap<>();
- if (hasLogFiles) {
- params.put("hoodie.datasource.query.type", "snapshot");
- } else {
- params.put("hoodie.datasource.query.type", "read_optimized");
- }
Review Comment:
By default, `hoodie.datasource.query.type` is set to `snapshot`, and the new
`HadoopFSRelation` based reader logic in Spark makes sure there's no
performance degradation for base file-only cases in MOR, so
`params.put("hoodie.datasource.query.type", "read_optimized")` should not be
needed either. Could you point out what errors are thrown if these lines are
removed? It would be good to record and understand the errors to make sure
there is no other related issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]