Re: do spark structured streaming writer options works with foreachBatch?

Ryan Blue Thu, 31 Aug 2023 08:59:18 -0700

We generally don't recommend fanout writers because they create lots of
small data files. It also isn't clear why the table's partitioning isn't
causing Spark to distribute the data properly -- maybe you're using an old
Spark version?


In any case, you can distribute the data yourself to align with the table's
partitioning. You can either use `.repartition` with the dataframes API, or
you can sort. I recommend sorting by adding an `ORDER BY` statement to your
SQL.

Ryan

On Wed, Aug 30, 2023 at 10:36 AM Nirav Patel <nira...@gmail.com> wrote:

> Should I try "fanout-enabled" option within foreachBatch method where I do
> dataframe.write ?
>
> On Wed, Aug 30, 2023 at 10:29 AM Nirav Patel <nira...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using spark structured streaming and using foreachBatch sink to
>> append to iceberg dual hidden partitioned table.
>> I got this infamous error about input dataframe or partition needing to
>> be clustered:
>>
>> *Incoming records violate the writer assumption that records are
>> clustered by spec and by partition within each spec. Either cluster the
>> incoming records or switch to fanout writers.*
>>
>> I tried setting "fanout-enabled" to "true" before calling foreachBatch
>> but it didnt work at all. Got same error.
>>
>> I tried partitionedBy(days("date"), col("customerid")) and that didn't
>> work either.
>>
>> Then I used spark sql approach:
>> INSERT INTO {dest_schema_fqn}
>>                 SELECT * from {success_agg_tbl} order by date(date),
>> tenant
>>
>> and that worked.
>>
>> I know of following table level config:
>> write.spark.fanout.enabled - False
>> write.distribution-mode - None
>> but I have left it to defaults as I assume writer will override those
>> settings.
>>
>> so do "fanout-enabled" option have effect when using with foreachBatch?
>> (I'm new to spark streaming as well)
>>
>> thanks
>>
>

-- 
Ryan Blue
Tabular

Re: do spark structured streaming writer options works with foreachBatch?

Reply via email to