[
https://issues.apache.org/jira/browse/SPARK-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496548#comment-14496548
]
Shivaram Venkataraman commented on SPARK-6831:
----------------------------------------------
I think we should give an example of an external data source and how to use it
in our programming guide as not everybody writes a `Linking` section in their
README. We could use Avro as an example and just describe how to pass it in
with `--jars` (BTW where do you get the JAR to pass it in like this ?) and say
how to use `load` -- Just those two things should be enoug.
While I know that the spark-packages page lists many connectors it is often
hard to exactly figure out which package is a SQL data source or not (most
often people ask me about Cassandra for example). So we could also add a table
somewhere (like say the LIBSVM table of language APIs
http://www.csie.ntu.edu.tw/~cjlin/libsvm/) and have entries like `avro`,
`https://github.com/databricks/spark-avro` etc.
> Document how to use external data sources
> -----------------------------------------
>
> Key: SPARK-6831
> URL: https://issues.apache.org/jira/browse/SPARK-6831
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, SparkR, SQL
> Reporter: Shivaram Venkataraman
> Priority: Critical
>
> We should include some instructions on how to use an external datasource for
> users who are beginners. Do they need to install it on all the machines ? Or
> just the master ? Are there are any special flags they need to pass to
> `bin/spark-submit` etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]