James Q. Arnold created SPARK-19582:
---------------------------------------
Summary: DataFrameReader conceptually inadequate
Key: SPARK-19582
URL: https://issues.apache.org/jira/browse/SPARK-19582
Project: Spark
Issue Type: Bug
Components: Java API
Affects Versions: 2.1.0
Reporter: James Q. Arnold
DataFrameReader assumes it "understands" all data sources (local file system,
object stores, jdbc, ...). This seems limiting in the long term, imposing both
development costs to accept new sources and dependency issues for existing
sources (how to coordinate the XX jar for internal use vs. the XX jar used by
the application). Unless I have missed how this can be done currently, an
application with an unsupported data source cannot create the required RDD for
distribution.
I recommend at least providing a text API for supplying data. Let the
application provide data as a String (or char[] or ...)---not a path, but the
actual data. Alternatively, provide interfaces or abstract classes the
application could provide to let the application handle external data sources,
without forcing all that complication into the Spark implementation.
I don't have any code to submit, but JIRA seemed like to most appropriate place
to raise the issue.
Finally, if I have overlooked how this can be done with the current API, a new
example would be appreciated.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]