[
https://issues.apache.org/jira/browse/SPARK-29248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938060#comment-16938060
]
Jungtaek Lim commented on SPARK-29248:
--------------------------------------
Assuming writer got the information of number of partitions, what to do from
writer if the number of partitions in upstream plan for writer is different
from writer's requirement?
In many cases writer just assumes that upstream gives rows with desired
partitions/sort order - needs of SPARK-23889. There might be some cases this
issue still holds valid, writer may know about partitioning function (without
such interface) and be able to map row-to-partition and send row to specific
partition. I just haven't seen actual use case of this. Maybe it would be
better if you could elaborate what you're implementing and what you will do
with number of partitions.
> Pass in number of partitions to BuildWriter
> -------------------------------------------
>
> Key: SPARK-29248
> URL: https://issues.apache.org/jira/browse/SPARK-29248
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Ximo Guanter
> Priority: Major
>
> When implementing a ScanBuilder, we require the implementor to provide the
> schema of the data and the number of partitions.
> However, when someone is implementing WriteBuilder we only pass them the
> schema, but not the number of partitions. This is an asymetrical developer
> experience. Passing in the number of partitions on the WriteBuilder would
> enable data sources to provision their write targets before starting to
> write. For example, it could be used to provision a Kafka topic with a
> specific number of partitions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]