[
https://issues.apache.org/jira/browse/SPARK-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212049#comment-14212049
]
Andrew Ash commented on SPARK-748:
----------------------------------
I agree this would be valuable -- almost like a "Spark Cookbook" of how to read
and write data from various other systems. Step one is probably deciding what
software to mention.
Tentatively I propose:
Spark Core
- HDFS
- HBase
- Cassandra
- Elasticsearch
- JDBC, with examples for Postgres and MySQL
- General Hadoop InputFormat
Spark Streaming
- Kafka
- Flume
- Storm
For destination, this could go on the documentation included in the git repo
and published to the Spark website, or on the Spark project wiki. I tend to
prefer the former. A possible location for that could be
http://spark.apache.org/docs/latest/programming-guide.html#external-datasets
> Add documentation page describing interoperability with other software (e.g.
> HBase, JDBC, Kafka, etc.)
> ------------------------------------------------------------------------------------------------------
>
> Key: SPARK-748
> URL: https://issues.apache.org/jira/browse/SPARK-748
> Project: Spark
> Issue Type: New Feature
> Components: Documentation
> Reporter: Josh Rosen
>
> Spark seems to be gaining a lot of data input / output features for
> integrating with systems like HBase, Kafka, JDBC, Hadoop, etc.
> It might be a good idea to create a single documentation page that provides a
> list of all of the data sources that Spark supports and links to the relevant
> documentation / examples / {{spark-users}} threads. This would help
> prospective users to evaluate how easy it will be to integrate Spark with
> their existing systems.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]