Hi, I am using spark structured streaming and using foreachBatch sink to append to iceberg dual hidden partitioned table. I got this infamous error about input dataframe or partition needing to be clustered:
*Incoming records violate the writer assumption that records are clustered by spec and by partition within each spec. Either cluster the incoming records or switch to fanout writers.* I tried setting "fanout-enabled" to "true" before calling foreachBatch but it didnt work at all. Got same error. I tried partitionedBy(days("date"), col("customerid")) and that didn't work either. Then I used spark sql approach: INSERT INTO {dest_schema_fqn} SELECT * from {success_agg_tbl} order by date(date), tenant and that worked. I know of following table level config: write.spark.fanout.enabled - False write.distribution-mode - None but I have left it to defaults as I assume writer will override those settings. so do "fanout-enabled" option have effect when using with foreachBatch? (I'm new to spark streaming as well) thanks