Re: Update MySQL table via Spark/SparkR?

2017-08-21 Thread Mich Talebzadeh
Hi Jake, This is an issue across all RDBMs including Oracle etc. When you are updating you have to commit or roll back in RDBMS itself and I am not aware of Spark doing that. The staging table is a safer method as it follows ETL type approach. You create new data in the staging table in RDBMS

Re: Update MySQL table via Spark/SparkR?

2017-08-21 Thread ayan guha
How about append and a view simulating the update? Then you do not need 2 processes... On Tue, 22 Aug 2017 at 8:44 am, Mich Talebzadeh wrote: > Hi Jake, > > This is an issue across all RDBMs including Oracle etc. When you are > updating you have to commit or roll back

Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-21 Thread swetha kasireddy
Hi Cody, I think the Assign is used if we want it to start from a specified offset. What if we want it to start it from the latest offset with something like returned by "auto.offset.reset" -> "latest",. Thanks! On Mon, Aug 21, 2017 at 9:06 AM, Cody Koeninger wrote: >

'Premature end of Content-Length' using S3A to read huge data

2017-08-21 Thread Tushar Sudake
Hello, I'm writing a Spark based application which works around a pretty huge data stored on s3. It's about **15 TB** in size uncompressed. Data is laid across multiple small LZO compressed files files, varying from 10-100MB. By default the job spawns 130k tasks while reading dataset and mapping

Re: issue in add row functionality

2017-08-21 Thread abhimadav
You could use a map operation or transform on the existing dataframe to create the target dataframe. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/issue-in-add-row-functionality-tp29076p29094.html Sent from the Apache Spark User List mailing list archive

Re: Netty Issues

2017-08-21 Thread Pascal Stammer
Hi, I don’t know how this should help. We use maven shade plugin. This behavior currently happen in local unit tests. Pascal > Am 21.08.2017 um 12:58 schrieb 周康 : > > Use maven shade plugin may help > > 2017-08-21 18:43 GMT+08:00 Pascal Stammer

Netty Issues

2017-08-21 Thread Pascal Stammer
Hi all, i got following exception: 17/08/21 12:33:56 ERROR TransportClient: Failed to send RPC 5493448667271613330 to /10.210.85.3:52482: java.lang.AbstractMethodError java.lang.AbstractMethodError at io.netty.util.ReferenceCountUtil.touch(ReferenceCountUtil.java:73) at

Re: Netty Issues

2017-08-21 Thread 周康
Use maven shade plugin may help 2017-08-21 18:43 GMT+08:00 Pascal Stammer : > Hi all, > > i got following exception: > > 17/08/21 12:33:56 ERROR TransportClient: Failed to send RPC > 5493448667271613330 to /10.210.85.3:52482: java.lang.AbstractMethodError >

Update MySQL table via Spark/SparkR?

2017-08-21 Thread Jake Russ
Hi everyone, I’m currently using SparkR to read data from a MySQL database, perform some calculations, and then write the results back to MySQL. Is it still true that Spark does not support UPDATE queries via JDBC? I’ve seen many posts on the internet that Spark’s DataFrameWriter does not

RE: Spark Web UI SSL Encryption

2017-08-21 Thread Mendelson, Assaf
The following is based on stuff I did a while ago so I might be missing some parts. First you need to create a certificate. The following example creates a self-signed one: openssl genrsa -aes128 -out sparkssl.key 2048 -alias "standalone" openssl rsa -in sparkssl.key -pubout -out

Re: a set of practice and LAB

2017-08-21 Thread 周康
For spark,you can dive into examples source folder. 2017-08-21 4:49 GMT+08:00 Mohsen Pahlevanzadeh : > Dear All, > > > I need to a set of practice and LAB with sparc and hadoop, You will make > me happy for your help. > > Yours, > Mohsen > >

Re:Re: a set of practice and LAB

2017-08-21 Thread JH.Lin
i got no luck for LAB with better perf in hadoop . cheers 在2017年08月21 21时54分, "周康"写道: For spark,you can dive into examples source folder. 2017-08-21 4:49 GMT+08:00 Mohsen Pahlevanzadeh : Dear All, I need to a set of practice and LAB

Re: Spark Web UI SSL Encryption

2017-08-21 Thread Saisai Shao
Can you please post the specific problem you met? Thanks Jerry On Sat, Aug 19, 2017 at 1:49 AM, Anshuman Kumar wrote: > Hello, > > I have recently installed Sparks 2.2.0, and trying to use it for some big > data processing. Spark is installed on a server that I

Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-21 Thread Cody Koeninger
Yes, you can start from specified offsets. See ConsumerStrategy, specifically Assign http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#your-own-data-store On Tue, Aug 15, 2017 at 1:18 PM, SRK wrote: > Hi, > > How to force Spark Kafka Direct to

Chaining Spark Streaming Jobs

2017-08-21 Thread Sunita Arvind
Hello Spark Experts, I have a design question w.r.t Spark Streaming. I have a streaming job that consumes protocol buffer encoded real time logs from a Kafka cluster on premise. My spark application runs on EMR (aws) and persists data onto s3. Before I persist, I need to strip header and convert