geserdugarov commented on code in PR #12545:
URL: https://github.com/apache/hudi/pull/12545#discussion_r1908142484
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##########
@@ -207,11 +207,30 @@ public static DataStream<Object> append(
Configuration conf,
RowType rowType,
DataStream<RowData> dataStream) {
- WriteOperatorFactory<RowData> operatorFactory =
AppendWriteOperator.getFactory(conf, rowType);
+ boolean isBucketIndex = OptionsResolver.isBucketIndexType(conf);
+ if (isBucketIndex) {
Review Comment:
@danny0405 , my bad, I didn't check MOR behavior, new parquet files in
buckets are created during each insert. Therefore **users should use MOR table
to insert data using bucket index**, and there is no need in proposed changes.
But I'm worried that currently I can set bucket index for COW table, and
insert data. But **data will be written to parquets ignoring buckets
silently**. Maybe we should restrict this operations, and throw exception with
message:
"Bucket index is not supported for inserts into COW table. Please, use MOR
table or upsert operation."
Or we could log corresponding warning at least.
What do you think about it? Is it better to throw exception or log
corresponding warning?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]