Re: [PR] [WIP] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]

via GitHub Wed, 08 Jan 2025 20:32:28 -0800


geserdugarov commented on code in PR #12545:
URL: https://github.com/apache/hudi/pull/12545#discussion_r1908142484



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##########
@@ -207,11 +207,30 @@ public static DataStream<Object> append(
       Configuration conf,
       RowType rowType,
       DataStream<RowData> dataStream) {
-    WriteOperatorFactory<RowData> operatorFactory = 
AppendWriteOperator.getFactory(conf, rowType);
+    boolean isBucketIndex = OptionsResolver.isBucketIndexType(conf);
+    if (isBucketIndex) {

Review Comment:
   @danny0405 , I've checked behavior for MOR table, if we set bucket index. My 
bad, missed this case. So for inserts into MOR, new parquet files in buckets 
are created during each insert. It's similar to what I tried to implement in 
this MR. Therefore **users should use MOR table to insert data using bucket 
index**.
   
   But I'm worried that currently I can set bucket index for COW table, and 
insert data. But **data will be written to parquets without buckets silently**. 
Maybe we should restrict this operations, and throw exception with message:
   "Bucket index is not supported for inserts into COW table. Please, use MOR 
table or upsert operation."
   Or we could log corresponding warning at least.
   
   What do you think about it? Is it better to throw exception or log 
corresponding warning?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [WIP] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]

Reply via email to