James Q. Arnold created SPARK-19582:
---------------------------------------

             Summary: DataFrameReader conceptually inadequate
                 Key: SPARK-19582
                 URL: https://issues.apache.org/jira/browse/SPARK-19582
             Project: Spark
          Issue Type: Bug
          Components: Java API
    Affects Versions: 2.1.0
            Reporter: James Q. Arnold


DataFrameReader assumes it "understands" all data sources (local file system, 
object stores, jdbc, ...).  This seems limiting in the long term, imposing both 
development costs to accept new sources and dependency issues for existing 
sources (how to coordinate the XX jar for internal use vs. the XX jar used by 
the application).  Unless I have missed how this can be done currently, an 
application with an unsupported data source cannot create the required RDD for 
distribution.

I recommend at least providing a text API for supplying data.  Let the 
application provide data as a String (or char[] or ...)---not a path, but the 
actual data.  Alternatively, provide interfaces or abstract classes the 
application could provide to let the application handle external data sources, 
without forcing all that complication into the Spark implementation.

I don't have any code to submit, but JIRA seemed like to most appropriate place 
to raise the issue.

Finally, if I have overlooked how this can be done with the current API, a new 
example would be appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to