Hi,

I am using spark structured streaming and using foreachBatch sink to append
to iceberg dual hidden partitioned table.
I got this infamous error about input dataframe or partition needing to be
clustered:

*Incoming records violate the writer assumption that records are clustered
by spec and by partition within each spec. Either cluster the incoming
records or switch to fanout writers.*

I tried setting "fanout-enabled" to "true" before calling foreachBatch but
it didnt work at all. Got same error.

I tried partitionedBy(days("date"), col("customerid")) and that didn't work
either.

Then I used spark sql approach:
INSERT INTO {dest_schema_fqn}
                SELECT * from {success_agg_tbl} order by date(date), tenant

and that worked.

I know of following table level config:
write.spark.fanout.enabled - False
write.distribution-mode - None
but I have left it to defaults as I assume writer will override those
settings.

so do "fanout-enabled" option have effect when using with foreachBatch?
(I'm new to spark streaming as well)

thanks

Reply via email to