[ 
https://issues.apache.org/jira/browse/SPARK-24073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24073:
------------------------------------

    Assignee: Apache Spark

> DataSourceV2: Rename DataReaderFactory back to ReadTask.
> --------------------------------------------------------
>
>                 Key: SPARK-24073
>                 URL: https://issues.apache.org/jira/browse/SPARK-24073
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Ryan Blue
>            Assignee: Apache Spark
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Just before 2.3.0, SPARK-23219 renamed ReadTask to DataReaderFactory. The 
> intent was to make the read and write API match (write side uses 
> DataWriterFactory), but the underlying problem is that the two classes are 
> not equivalent.
> ReadTask/DataReader function as Iterable/Iterator. ReadTask is a specific to 
> a read task, in contrast to DataWriterFactory where the same factory instance 
> is used in all write tasks. ReadTask's purpose is to manage the lifecycle of 
> DataReader with an explicit create operation to mirror the close operation. 
> This is no longer clear from the API, where DataReaderFactory appears to be 
> more generic than it is and it isn't clear why a set of them is produced for 
> a read.
> We should rename DataReaderFactory back to ReadTask, which correctly conveys 
> the purpose and use of the class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to