Ryan Blue created SPARK-24073:
---------------------------------

             Summary: DataSourceV2: Rename DataReaderFactory back to ReadTask.
                 Key: SPARK-24073
                 URL: https://issues.apache.org/jira/browse/SPARK-24073
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Ryan Blue
             Fix For: 2.4.0


Just before 2.3.0, SPARK-23219 renamed ReadTask to DataReaderFactory. The 
intent was to make the read and write API match (write side uses 
DataWriterFactory), but the underlying problem is that the two classes are not 
equivalent.

ReadTask/DataReader function as Iterable/Iterator. ReadTask is a specific to a 
read task, in contrast to DataWriterFactory where the same factory instance is 
used in all write tasks. ReadTask's purpose is to manage the lifecycle of 
DataReader with an explicit create operation to mirror the close operation. 
This is no longer clear from the API, where DataReaderFactory appears to be 
more generic than it is and it isn't clear why a set of them is produced for a 
read.

We should rename DataReaderFactory back to ReadTask, which correctly conveys 
the purpose and use of the class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to