Re: Spark streaming RDDs to Parquet records

2014-06-17 Thread contractor
Thanks Krishna. Seems like you have to use Avro and then convert that to Parquet. I was hoping to directly convert RDDs to Parquet files. I’ll look into this some more. Thanks, Mahesh From: Krishna Sankar ksanka...@gmail.commailto:ksanka...@gmail.com Reply-To:

Re: Spark streaming RDDs to Parquet records

2014-06-19 Thread contractor
, Padmanabhan, Mahesh (contractor) [hidden email]/user/SendEmail.jtp?type=nodenode=7939i=1 wrote: Thanks Krishna. Seems like you have to use Avro and then convert that to Parquet. I was hoping to directly convert RDDs to Parquet files. I’ll look into this some more. Thanks, Mahesh From: Krishna

Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
Hello all, I am not sure what is going on – I am getting a NotSerializedException and initially I thought it was due to not registering one of my classes with Kryo but that doesn’t seem to be the case. I am essentially eliminating duplicates in a spark streaming application by using a “window”

Re: Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
()), (duplicateCount, (oc-dc).toString())) OperationalStatProducer.produce(statBody) } catch { case e: Exception = DebugLogger.report(e) } } } }) }) On Thu, Aug 7, 2014 at 9:03 AM, Padmanabhan, Mahesh (contractor

Re: Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
=true for driver in your driver startup-script? That should give an indication of the sequence of object references that lead to the StremaingContext being included in the closure. TD On Thu, Aug 7, 2014 at 10:23 AM, Padmanabhan, Mahesh (contractor) mahesh.padmanab...@twc

Re: Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
Thanks TD, Amit. I think I figured out where the problem is through the process of commenting out individual lines of code one at a time :( Can either of you help me find the right solution? I tried creating the SparkContext outside the foreachRDD but that didn’t help. I have an object (let’s

Re: Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
context as rdd.context. This will avoid referring of the ssc inside the foreachRDD. See if that helps. TD On Thu, Aug 7, 2014 at 12:47 PM, Padmanabhan, Mahesh (contractor) mahesh.padmanab...@twc-contractor.commailto:mahesh.padmanab...@twc-contractor.com wrote: Thanks TD, Amit. I think I

Re: Sending multiple DStream outputs

2014-09-18 Thread contractor
of the app for each and pass as arguments to each instance the input source and output topic? On Thu, Sep 18, 2014 at 8:07 AM, Padmanabhan, Mahesh (contractor) mahesh.padmanab...@twc-contractor.com wrote: Hi all, I am using Spark 1.0 streaming to ingest a a high volume stream of data (approx. 1mm

Spark Mesos integration bug?

2014-11-26 Thread contractor
Hi, We have been running Spark 1.0.2 with Mesos 0.20.1 in fine grained mode and for the most part it has been working well. We have been using mesos://zk://server1:2181,server2:2181,server3:2181/mesos as the spark master URL and this works great to get the Mesos leader. Unfortunately, this

Custom receiver runtime Kryo exception

2015-01-05 Thread contractor
Hello all, I am using Spark 1.0.2 and I have a custom receiver that works well. I tried adding Kryo serialization to SparkConf: val spark = new SparkConf() ….. .set(spark.serializer, org.apache.spark.serializer.KryoSerializer) and I am getting a strange error that I am not sure how to

Re: Kafka Spark Partition Mapping

2015-08-24 Thread Syed, Nehal (Contractor)
Dear Cody, Thanks for your response, I am trying to do decoration which means when a message comes from Kafka (partitioned by key) in to the Spark I want to add more fields/data to it. How Does normally people do it in Spark? If it were you how would you decorate message without hitting

Spark consumes more memory

2017-05-11 Thread Anantharaman, Srinatha (Contractor)
Hi, I am reading a Hive Orc table into memory, StorageLevel is set to (StorageLevel.MEMORY_AND_DISK_SER) Total size of the Hive table is 5GB Started the spark-shell as below spark-shell --master yarn --deploy-mode client --num-executors 8 --driver-memory 5G --executor-memory 7G

RE: Spark consumes more memory

2017-05-11 Thread Anantharaman, Srinatha (Contractor)
, May 11, 2017 1:34 PM To: Anantharaman, Srinatha (Contractor) <srinatha_ananthara...@comcast.com>; user <user@spark.apache.org> Subject: Re: Spark consumes more memory I would try to track down the "no space left on device" - find out where that originates from, si

Re: Java SPI jar reload in Spark

2017-06-07 Thread Jonnas Li(Contractor)
<mailto:user@spark.apache.org>> 主题: Re: Java SPI jar reload in Spark Hi, a quick search on google. https://github.com/spark-jobserver/spark-jobserver/issues/130 <https://about.me/alonso.isidoro.roman?promo=email_sig_source=email_sig_medium=email_sig_campaign=external_lin

Java SPI jar reload in Spark

2017-06-06 Thread Jonnas Li(Contractor)
I have a Spark Streaming application, which dynamically calling a jar (Java SPI), and the jar is called in a mapWithState() function, it was working fine for a long time. Recently, I got a requirement which required to reload the jar during runtime. But when the reloading is completed, the spark

Re: Java SPI jar reload in Spark

2017-06-06 Thread Jonnas Li(Contractor)
there is another way to achieve the same without jar reloading. In fact, it might be dangerous from a functional point of view- functionality in jar changed and all your computation is wrong. On 6. Jun 2017, at 11:35, Jonnas Li(Contractor) <zhongshuang...@envisioncn.com<mailto:zhongshuang...@env

Re: Java SPI jar reload in Spark

2017-06-06 Thread Jonnas Li(Contractor)
links> Alonso Isidoro Roman about.me/alonso.isidoro.roman 2017-06-06 12:14 GMT+02:00 Jonnas Li(Contractor) <zhongshuang...@envisioncn.com<mailto:zhongshuang...@envisioncn.com>>: Thank for your quick response. These jars are used to define some customize business logic, and they

streaming and piping to R, sending all data in window to pipe()

2015-07-17 Thread PAULI, KEVIN CHRISTIAN [AG-Contractor/1000]
Spark newbie here, using Spark 1.3.1. I’m consuming a stream and trying to pipe the data from the entire window to R for analysis. The R algorithm needs the entire dataset from the stream (everything in the window) in order to function properly; it can’t be broken up. So I tried doing a