[ https://issues.apache.org/jira/browse/SPARK-24073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450443#comment-16450443 ]
Apache Spark commented on SPARK-24073: -------------------------------------- User 'rdblue' has created a pull request for this issue: https://github.com/apache/spark/pull/21145 > DataSourceV2: Rename DataReaderFactory back to ReadTask. > -------------------------------------------------------- > > Key: SPARK-24073 > URL: https://issues.apache.org/jira/browse/SPARK-24073 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.3.0 > Reporter: Ryan Blue > Priority: Major > Fix For: 2.4.0 > > > Just before 2.3.0, SPARK-23219 renamed ReadTask to DataReaderFactory. The > intent was to make the read and write API match (write side uses > DataWriterFactory), but the underlying problem is that the two classes are > not equivalent. > ReadTask/DataReader function as Iterable/Iterator. ReadTask is a specific to > a read task, in contrast to DataWriterFactory where the same factory instance > is used in all write tasks. ReadTask's purpose is to manage the lifecycle of > DataReader with an explicit create operation to mirror the close operation. > This is no longer clear from the API, where DataReaderFactory appears to be > more generic than it is and it isn't clear why a set of them is produced for > a read. > We should rename DataReaderFactory back to ReadTask, which correctly conveys > the purpose and use of the class. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org