geserdugarov commented on code in PR #12545:
URL: https://github.com/apache/hudi/pull/12545#discussion_r1908142484
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##########
@@ -207,11 +207,30 @@ public static DataStream<Object> append(
Configuration conf,
RowType rowType,
DataStream<RowData> dataStream) {
- WriteOperatorFactory<RowData> operatorFactory =
AppendWriteOperator.getFactory(conf, rowType);
+ boolean isBucketIndex = OptionsResolver.isBucketIndexType(conf);
+ if (isBucketIndex) {
Review Comment:
@danny0405 , I've checked behavior for MOR table, if we set bucket index. My
bad, missed this case. So for inserts into MOR, new parquet files in buckets
are created during each insert. It's similar to what I tried to implement in
this MR. Therefore **users should use MOR table to insert data using bucket
index**.
But I'm worried that currently I can set bucket index for COW table, and
insert data. But **data will be written to parquets without buckets silently**.
Maybe we should restrict this operations, and throw exception with message:
"Bucket index is not supported for inserts into COW table. Please, use MOR
table or upsert operation."
Or we could log corresponding warning at least.
What do you think about it? Is it better to throw exception or log
corresponding warning?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]