[GitHub] [hudi] voonhous commented on a diff in pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction/skip_clustering config

via GitHub Thu, 02 Feb 2023 23:42:35 -0800


voonhous commented on code in PR #6856:
URL: https://github.com/apache/hudi/pull/6856#discussion_r1095441321



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -297,19 +299,22 @@ private FlinkOptions() {
       .key("read.streaming.skip_compaction")
       .booleanType()
       .defaultValue(false)// default read as batch
-      .withDescription("Whether to skip compaction instants for streaming 
read,\n"
-          + "there are two cases that this option can be used to avoid reading 
duplicates:\n"
-          + "1) you are definitely sure that the consumer reads faster than 
any compaction instants, "
-          + "usually with delta time compaction strategy that is long enough, 
for e.g, one week;\n"
+      .withDescription("Whether to skip compaction instants and avoid reading 
compacted base files for streaming read to improve read performance.\n"
+          + "There are two cases that this option can be used to avoid reading 
duplicates:\n"
+          + "1) you are definitely sure that the consumer reads [faster 
than/completes before] any compaction instants "
+          + "when " + HoodieCompactionConfig.PRESERVE_COMMIT_METADATA.key() + 
" is set to false.\n"
           + "2) changelog mode is enabled, this option is a solution to keep 
data integrity");

Review Comment:
   What about this:
   
   ```text
   Whether to skip compaction instants and avoid reading compacted base files.
   When performing streaming reads, setting this to true will help to improve 
read performance.
   
   Please set this configuration to true to avoid reading duplicates if:
   1) HoodieCompactionConfig.PRESERVE_COMMIT_METADATA.key() is set to false 
   2) changelog mode is enabled, this option is a solution to keep data 
integrity
   ```
   
   Is this better?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] voonhous commented on a diff in pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction/skip_clustering config

Reply via email to