Re: what is the best way to transfer data from RDBMS to spark?

ayan guha Sat, 25 Apr 2015 01:25:39 -0700

Actually, Spark SQL provides a data source. Here is from documentation -

JDBC To Other Databases

Spark SQL also includes a data source that can read data from other
databases using JDBC. This functionality should be preferred over using
JdbcRDD
<https://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.rdd.JdbcRDD>.
This is because the results are returned as a DataFrame and they can easily
be processed in Spark SQL or joined with other data sources. The JDBC data
source is also easier to use from Java or Python as it does not require the
user to provide a ClassTag. (Note that this is different than the Spark SQL
JDBC server, which allows other applications to run queries using Spark
SQL).

On Fri, Apr 24, 2015 at 6:27 PM, ayan guha <guha.a...@gmail.com> wrote:

> What is the specific usecase? I can think of couple of ways (write to hdfs
> and then read from spark or stream data to spark). Also I have seen people
> using mysql jars to bring data in. Essentially you want to simulate
> creation of rdd.
> On 24 Apr 2015 18:15, "sequoiadb" <mailing-list-r...@sequoiadb.com> wrote:
>
>> If I run spark in stand-alone mode ( not YARN mode ), is there any tool
>> like Sqoop that able to transfer data from RDBMS to spark storage?
>>
>> Thanks
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>

-- 
Best Regards,
Ayan Guha

Re: what is the best way to transfer data from RDBMS to spark?

Reply via email to