Thanks Krishna. Seems like you have to use Avro and then convert that to
Parquet. I was hoping to directly convert RDDs to Parquet files. I’ll look into
this some more.
Thanks,
Mahesh
From: Krishna Sankar ksanka...@gmail.commailto:ksanka...@gmail.com
Reply-To:
, Padmanabhan, Mahesh (contractor) [hidden
email]/user/SendEmail.jtp?type=nodenode=7939i=1 wrote:
Thanks Krishna. Seems like you have to use Avro and then convert that to
Parquet. I was hoping to directly convert RDDs to Parquet files. I’ll look into
this some more.
Thanks,
Mahesh
From: Krishna
Hello all,
I am not sure what is going on – I am getting a NotSerializedException and
initially I thought it was due to not registering one of my classes with Kryo
but that doesn’t seem to be the case. I am essentially eliminating duplicates
in a spark streaming application by using a “window”
()),
(duplicateCount, (oc-dc).toString()))
OperationalStatProducer.produce(statBody)
} catch { case e: Exception = DebugLogger.report(e) }
}
}
})
})
On Thu, Aug 7, 2014 at 9:03 AM, Padmanabhan, Mahesh (contractor
=true for
driver in your driver startup-script? That should give an indication of the
sequence of object references that lead to the StremaingContext being included
in the closure.
TD
On Thu, Aug 7, 2014 at 10:23 AM, Padmanabhan, Mahesh (contractor)
mahesh.padmanab...@twc
Thanks TD, Amit.
I think I figured out where the problem is through the process of commenting
out individual lines of code one at a time :(
Can either of you help me find the right solution? I tried creating the
SparkContext outside the foreachRDD but that didn’t help.
I have an object (let’s
context as rdd.context. This will avoid referring of the ssc inside the
foreachRDD.
See if that helps.
TD
On Thu, Aug 7, 2014 at 12:47 PM, Padmanabhan, Mahesh (contractor)
mahesh.padmanab...@twc-contractor.commailto:mahesh.padmanab...@twc-contractor.com
wrote:
Thanks TD, Amit.
I think I
of the app for
each and pass as arguments to each instance the input source and
output topic?
On Thu, Sep 18, 2014 at 8:07 AM, Padmanabhan, Mahesh (contractor)
mahesh.padmanab...@twc-contractor.com wrote:
Hi all,
I am using Spark 1.0 streaming to ingest a a high volume stream of data
(approx. 1mm
Hi,
We have been running Spark 1.0.2 with Mesos 0.20.1 in fine grained mode and for
the most part it has been working well.
We have been using mesos://zk://server1:2181,server2:2181,server3:2181/mesos as
the spark master URL and this works great to get the Mesos leader.
Unfortunately, this
Hello all,
I am using Spark 1.0.2 and I have a custom receiver that works well.
I tried adding Kryo serialization to SparkConf:
val spark = new SparkConf()
…..
.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
and I am getting a strange error that I am not sure how to
Dear Cody,
Thanks for your response, I am trying to do decoration which means when a
message comes from Kafka (partitioned by key) in to the Spark I want to add
more fields/data to it.
How Does normally people do it in Spark? If it were you how would you decorate
message without hitting
Hi,
I am reading a Hive Orc table into memory, StorageLevel is set to
(StorageLevel.MEMORY_AND_DISK_SER)
Total size of the Hive table is 5GB
Started the spark-shell as below
spark-shell --master yarn --deploy-mode client --num-executors 8
--driver-memory 5G --executor-memory 7G
, May 11, 2017 1:34 PM
To: Anantharaman, Srinatha (Contractor) <srinatha_ananthara...@comcast.com>;
user <user@spark.apache.org>
Subject: Re: Spark consumes more memory
I would try to track down the "no space left on device" - find out where that
originates from, si
<mailto:user@spark.apache.org>>
主题: Re: Java SPI jar reload in Spark
Hi, a quick search on google.
https://github.com/spark-jobserver/spark-jobserver/issues/130
<https://about.me/alonso.isidoro.roman?promo=email_sig_source=email_sig_medium=email_sig_campaign=external_lin
I have a Spark Streaming application, which dynamically calling a jar (Java
SPI), and the jar is called in a mapWithState() function, it was working fine
for a long time.
Recently, I got a requirement which required to reload the jar during runtime.
But when the reloading is completed, the spark
there is another way to achieve the same without jar
reloading. In fact, it might be dangerous from a functional point of view-
functionality in jar changed and all your computation is wrong.
On 6. Jun 2017, at 11:35, Jonnas Li(Contractor)
<zhongshuang...@envisioncn.com<mailto:zhongshuang...@env
links>
Alonso Isidoro Roman
about.me/alonso.isidoro.roman
2017-06-06 12:14 GMT+02:00 Jonnas Li(Contractor)
<zhongshuang...@envisioncn.com<mailto:zhongshuang...@envisioncn.com>>:
Thank for your quick response.
These jars are used to define some customize business logic, and they
Spark newbie here, using Spark 1.3.1.
I’m consuming a stream and trying to pipe the data from the entire window to R
for analysis. The R algorithm needs the entire dataset from the stream
(everything in the window) in order to function properly; it can’t be broken up.
So I tried doing a
18 matches
Mail list logo