Re: Update MySQL table via Spark/SparkR?

2017-08-22 Thread Pierce Lamb
Hi Jake,

There is a another option within the 3rd party projects in the spark
database ecosystem that have combined Spark with a DBMS in such a way that
DataFrame API has been extended to include UPDATE operations
<http://snappydatainc.github.io/snappydata/programming_guide/#create-row-tables-using-api-update-the-contents-of-row-table>.
However, in your case you would have to move away from MySQL in order to
use this API.

Best,

Pierce

On Tue, Aug 22, 2017 at 7:54 AM, Jake Russ <jr...@bloomintelligence.com>
wrote:

> Hi Mich,
>
>
>
> Thank you for the explanation, that makes sense, and is helpful for me to
> understand the bigger picture between Spark/RDBMS.
>
>
>
> Happy to know I’m already following best practice.
>
>
>
> Cheers,
>
>
>
> Jake
>
>
>
> *From: *Mich Talebzadeh <mich.talebza...@gmail.com>
> *Date: *Monday, August 21, 2017 at 6:44 PM
> *To: *Jake Russ <jr...@bloomintelligence.com>
> *Cc: *"user@spark.apache.org" <user@spark.apache.org>
> *Subject: *Re: Update MySQL table via Spark/SparkR?
>
>
>
> Hi Jake,
>
> This is an issue across all RDBMs including Oracle etc. When you are
> updating you have to commit or roll back in RDBMS itself and I am not aware
> of Spark doing that.
>
> The staging table is a safer method as it follows ETL type approach. You
> create new data in the staging table in RDBMS and do the DML in the RDBMS
> itself where you can control commit or rollback. That is the way I would do
> it. A simple shell script can do both.
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
> On 21 August 2017 at 15:50, Jake Russ <jr...@bloomintelligence.com> wrote:
>
> Hi everyone,
>
>
>
> I’m currently using SparkR to read data from a MySQL database, perform
> some calculations, and then write the results back to MySQL. Is it still
> true that Spark does not support UPDATE queries via JDBC? I’ve seen many
> posts on the internet that Spark’s DataFrameWriter does not support
> UPDATE queries via JDBC
> <https://issues.apache.org/jira/browse/SPARK-19335>. It will only
> “append” or “overwrite” to existing tables. The best advice I’ve found so
> far, for performing this update, is to write to a staging table in MySQL
> <https://stackoverflow.com/questions/34643200/spark-dataframes-upsert-to-postgres-table>
>  and
> then perform the UPDATE query on the MySQL side.
>
>
>
> Ideally, I’d like to handle the update during the write operation. Has
> anyone else encountered this limitation and have a better solution?
>
>
>
> Thank you,
>
>
>
> Jake
>
>
>


Re: Update MySQL table via Spark/SparkR?

2017-08-22 Thread Jake Russ
Hi Mich,

Thank you for the explanation, that makes sense, and is helpful for me to 
understand the bigger picture between Spark/RDBMS.

Happy to know I’m already following best practice.

Cheers,

Jake

From: Mich Talebzadeh <mich.talebza...@gmail.com>
Date: Monday, August 21, 2017 at 6:44 PM
To: Jake Russ <jr...@bloomintelligence.com>
Cc: "user@spark.apache.org" <user@spark.apache.org>
Subject: Re: Update MySQL table via Spark/SparkR?

Hi Jake,
This is an issue across all RDBMs including Oracle etc. When you are updating 
you have to commit or roll back in RDBMS itself and I am not aware of Spark 
doing that.
The staging table is a safer method as it follows ETL type approach. You create 
new data in the staging table in RDBMS and do the DML in the RDBMS itself where 
you can control commit or rollback. That is the way I would do it. A simple 
shell script can do both.
HTH



Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 21 August 2017 at 15:50, Jake Russ 
<jr...@bloomintelligence.com<mailto:jr...@bloomintelligence.com>> wrote:
Hi everyone,

I’m currently using SparkR to read data from a MySQL database, perform some 
calculations, and then write the results back to MySQL. Is it still true that 
Spark does not support UPDATE queries via JDBC? I’ve seen many posts on the 
internet that Spark’s DataFrameWriter does not support UPDATE queries via 
JDBC<https://issues.apache.org/jira/browse/SPARK-19335>. It will only “append” 
or “overwrite” to existing tables. The best advice I’ve found so far, for 
performing this update, is to write to a staging table in 
MySQL<https://stackoverflow.com/questions/34643200/spark-dataframes-upsert-to-postgres-table>
 and then perform the UPDATE query on the MySQL side.

Ideally, I’d like to handle the update during the write operation. Has anyone 
else encountered this limitation and have a better solution?

Thank you,

Jake



Re: Update MySQL table via Spark/SparkR?

2017-08-21 Thread ayan guha
How about append and a view simulating the update? Then you do not need 2
processes...

On Tue, 22 Aug 2017 at 8:44 am, Mich Talebzadeh 
wrote:

> Hi Jake,
>
> This is an issue across all RDBMs including Oracle etc. When you are
> updating you have to commit or roll back in RDBMS itself and I am not aware
> of Spark doing that.
>
> The staging table is a safer method as it follows ETL type approach. You
> create new data in the staging table in RDBMS and do the DML in the RDBMS
> itself where you can control commit or rollback. That is the way I would do
> it. A simple shell script can do both.
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 August 2017 at 15:50, Jake Russ  wrote:
>
>> Hi everyone,
>>
>>
>>
>> I’m currently using SparkR to read data from a MySQL database, perform
>> some calculations, and then write the results back to MySQL. Is it still
>> true that Spark does not support UPDATE queries via JDBC? I’ve seen many
>> posts on the internet that Spark’s DataFrameWriter does not support
>> UPDATE queries via JDBC
>> . It will only
>> “append” or “overwrite” to existing tables. The best advice I’ve found so
>> far, for performing this update, is to write to a staging table in MySQL
>> 
>>  and
>> then perform the UPDATE query on the MySQL side.
>>
>>
>>
>> Ideally, I’d like to handle the update during the write operation. Has
>> anyone else encountered this limitation and have a better solution?
>>
>>
>>
>> Thank you,
>>
>>
>>
>> Jake
>>
>
> --
Best Regards,
Ayan Guha


Re: Update MySQL table via Spark/SparkR?

2017-08-21 Thread Mich Talebzadeh
Hi Jake,

This is an issue across all RDBMs including Oracle etc. When you are
updating you have to commit or roll back in RDBMS itself and I am not aware
of Spark doing that.

The staging table is a safer method as it follows ETL type approach. You
create new data in the staging table in RDBMS and do the DML in the RDBMS
itself where you can control commit or rollback. That is the way I would do
it. A simple shell script can do both.

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 August 2017 at 15:50, Jake Russ  wrote:

> Hi everyone,
>
>
>
> I’m currently using SparkR to read data from a MySQL database, perform
> some calculations, and then write the results back to MySQL. Is it still
> true that Spark does not support UPDATE queries via JDBC? I’ve seen many
> posts on the internet that Spark’s DataFrameWriter does not support
> UPDATE queries via JDBC
> . It will only
> “append” or “overwrite” to existing tables. The best advice I’ve found so
> far, for performing this update, is to write to a staging table in MySQL
> 
>  and
> then perform the UPDATE query on the MySQL side.
>
>
>
> Ideally, I’d like to handle the update during the write operation. Has
> anyone else encountered this limitation and have a better solution?
>
>
>
> Thank you,
>
>
>
> Jake
>


Update MySQL table via Spark/SparkR?

2017-08-21 Thread Jake Russ
Hi everyone,

I’m currently using SparkR to read data from a MySQL database, perform some 
calculations, and then write the results back to MySQL. Is it still true that 
Spark does not support UPDATE queries via JDBC? I’ve seen many posts on the 
internet that Spark’s DataFrameWriter does not support UPDATE queries via 
JDBC. It will only “append” 
or “overwrite” to existing tables. The best advice I’ve found so far, for 
performing this update, is to write to a staging table in 
MySQL
 and then perform the UPDATE query on the MySQL side.

Ideally, I’d like to handle the update during the write operation. Has anyone 
else encountered this limitation and have a better solution?

Thank you,

Jake