if there is, it probably
still requires manual work on the new nodes. This would be the advantage of
EMR over EC2, as we take care of all of that configuration.
~ Jonathan
From: Vadim Bichutskiy vadim.bichuts...@gmail.com
Date: Friday, June 19, 2015 at 5:21 PM
To: Jonathan Kelly jonat
:
Would you be able to use Spark on EMR rather than on EC2? EMR clusters
allow easy resizing of the cluster, and EMR also now supports Spark 1.3.1
as of EMR AMI 3.8.0. See http://aws.amazon.com/emr/spark
~ Jonathan
From: Vadim Bichutskiy vadim.bichuts...@gmail.com
Date: Friday, June 19, 2015
Hello Spark Experts,
I've been running a standalone Spark cluster on EC2 for a few months now,
and today I get this error:
IOError: [Errno 28] No space left on device
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
OpenJDK 64-Bit Server VM warning: Insufficient
Yes, we're running Spark on EC2. Will transition to EMR soon. -Vadim
ᐧ
On Sat, May 23, 2015 at 2:22 PM, Johan Beisser j...@caustic.org wrote:
Yes.
We're looking at bootstrapping in EMR...
On Sat, May 23, 2015 at 07:21 Joe Wass jw...@crossref.org wrote:
I used Spark on EC2 a while ago
/FileInputDStream.scala#L172
Thanks
Best Regards
On Fri, May 15, 2015 at 2:25 AM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
How does textFileStream work behind the scenes? How does Spark Streaming
know what files are new and need to be processed? Is it based on time
stamp, file
, 2015 at 9:53 AM, Evo Eftimov evo.efti...@isecc.com
wrote:
I can confirm it does work in Java
*From:* Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com]
*Sent:* Tuesday, May 12, 2015 5:53 PM
*To:* Evo Eftimov
*Cc:* Saisai Shao; user@spark.apache.org
*Subject:* Re: DStream Union vs
How does textFileStream work behind the scenes? How does Spark Streaming
know what files are new and need to be processed? Is it based on time
stamp, file name?
Thanks,
Vadim
ᐧ
...@isecc.com
wrote:
I can confirm it does work in Java
*From:* Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com]
*Sent:* Tuesday, May 12, 2015 5:53 PM
*To:* Evo Eftimov
*Cc:* Saisai Shao; user@spark.apache.org
*Subject:* Re: DStream Union vs. StreamingContext Union
Thanks Evo. I tried
Vadim Bichutskiy vadim.bichuts...@gmail.com:
Can someone explain to me the difference between DStream union and
StreamingContext union?
When do you use one vs the other?
Thanks,
Vadim
ᐧ
multiple DstreamRDDs in this way
DstreamRDD1.union(DstreamRDD2).union(DstreamRDD3) etc etc
Ps: the API is not “redundant” it offers several ways for achivieving the
same thing as a convenience depending on the situation
*From:* Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com]
*Sent
Can someone explain to me the difference between DStream union and
StreamingContext union?
When do you use one vs the other?
Thanks,
Vadim
ᐧ
I was wondering about the same thing.
Vadim
ᐧ
On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote:
Looks to me that the same thing also applies to the SparkContext.textFile
or SparkContext.wholeTextFile, there is no way in RDD to figure out the
file information where the
I was having this issue when my batch interval was very big -- like 5
minutes. When my batch interval is
smaller, I don't get this exception. Can someone explain to me why this
might be happening?
Vadim
ᐧ
On Tue, Apr 28, 2015 at 4:26 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
I am
I am using Spark Streaming to monitor an S3 bucket. Everything appears to
be fine. But every batch interval I get the following:
*15/04/28 16:12:36 WARN HttpMethodReleaseInputStream: Attempting to release
HttpMethod in finalize() as its response data stream has gone out of scope.
This attempt
def get_metadata():
...
return mylist
ᐧ
On Wed, Apr 22, 2015 at 6:47 PM, Tathagata Das t...@databricks.com wrote:
Can you give full code? especially the myfunc?
On Wed, Apr 22, 2015 at 2:20 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
Here's what I did:
print 'BROADCASTING
as you
share a spark context all will work as expected.
http://stackoverflow.com/questions/142545/python-how-to-make-a-cross-module-variable
Sent with Good (www.good.com)
-Original Message-
*From: *Vadim Bichutskiy [vadim.bichuts...@gmail.com]
*Sent: *Thursday, April 23, 2015
it will immutable at the executors, and if you update the list
at the driver, you will have to broadcast it again.
TD
On Wed, Apr 22, 2015 at 9:28 AM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
I am using Spark Streaming with Python. For each RDD, I call a map,
i.e., myrdd.map(myfunc
is in a different module. How do I make it aware of
broadcastVar?
ᐧ
On Wed, Apr 22, 2015 at 2:13 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
Great. Will try to modify the code. Always room to optimize!
ᐧ
On Wed, Apr 22, 2015 at 2:11 PM, Tathagata Das t...@databricks.com
wrote
I am using Spark Streaming where during each micro-batch I output data to
S3 using
saveAsTextFile. Right now each batch of data is put into its own directory
containing
2 objects, _SUCCESS and part-0.
How do I output each batch into a common directory?
Thanks,
Vadim
ᐧ
containing
partitions, as is common in Hadoop. You can move them later, or just
read them where they are.
On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
I am using Spark Streaming where during each micro-batch I output data to S3
using
saveAsTextFile
find them easily. Or consider somehow sending the batches of
data straight into Redshift? no idea how that is done but I imagine
it's doable.
On Thu, Apr 16, 2015 at 6:38 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
Thanks Sean. I want to load each batch into Redshift. What's
the time as part of a DStream
If you want fine / detailed management of the writing to HDFS you can
implement your own HDFS adapter and invoke it in forEachRDD and foreach
Regards
Evo Eftimov
From: Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com]
Sent: Thursday, April 16, 2015 6
(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
Has anyone else run into this issue?
On Mon, Apr 13, 2015 at 6:46 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
I don't believe the Kinesis asl should be provided. I used
mergeStrategy successfully to produce an uber jar
I don't believe the Kinesis asl should be provided. I used mergeStrategy
successfully to produce an uber jar.
Fyi, I've been having trouble consuming data out of Kinesis with Spark with no
success :(
Would be curious to know if you got it working.
Vadim
On Apr 13, 2015, at 9:36 PM, Mike
a spark-submit job via uber jar). Feel free to add me to
gmail chat and maybe we can help each other.
On Mon, Apr 13, 2015 at 6:46 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
I don't believe the Kinesis asl should be provided. I used mergeStrategy
successfully to produce an uber
Hi all,
I am using Spark Streaming to monitor an S3 bucket for objects that contain
JSON. I want
to import that JSON into Spark SQL DataFrame.
Here's my current code:
*from pyspark import SparkContext, SparkConf*
*from pyspark.streaming import StreamingContext*
*import json*
*from pyspark.sql
When I call *transform* or *foreachRDD *on* DStream*, I keep getting an
error that I have an empty RDD, which make sense since my batch interval
maybe smaller than the rate at which new data are coming in. How to guard
against it?
Thanks,
Vadim
ᐧ
Hi all,
I figured it out! The DataFrames and SQL example in Spark Streaming docs
were useful.
Best,
Vadim
ᐧ
On Wed, Apr 8, 2015 at 2:38 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com
wrote:
Hi all,
I am using Spark Streaming to monitor an S3 bucket for objects that
contain JSON. I want
Thanks TD!
On Apr 8, 2015, at 9:36 PM, Tathagata Das t...@databricks.com wrote:
Aah yes. The jsonRDD method needs to walk through the whole RDD to understand
the schema, and does not work if there is not data in it. Making sure there
is no data in it using take(1) should work.
TD
Hey y'all,
While I haven't been able to get Spark + Kinesis integration working, I
pivoted to plan B: I now push data to S3 where I set up a DStream to
monitor an S3 bucket with textFileStream, and that works great.
I 3 Spark!
Best,
Vadim
ᐧ
On Mon, Apr 6, 2015 at 12:23 PM, Vadim Bichutskiy
Hi all,
I am wondering, has anyone on this list been able to successfully implement
Spark on top of Kinesis?
Best,
Vadim
ᐧ
On Sun, Apr 5, 2015 at 1:50 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com
wrote:
ᐧ
Hi all,
Below is the output that I am getting. My Kinesis stream has 1 shard
***
15/04/05 17:14:50 INFO scheduler.ReceivedBlockTracker: Deleting batches
ArrayBuffer(142825407 ms)
On Sat, Apr 4, 2015 at 3:13 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com
wrote:
Hi all,
More good news! I was able to utilize mergeStrategy to assembly my Kinesis
consumer into an uber
-kinesis-asl
libraryDependencies += org.apache.spark %% spark-streaming-kinesis-asl
% 1.3.0
On Fri, Apr 3, 2015 at 12:45 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
Thanks. So how do I fix it?
ᐧ
On Fri, Apr 3, 2015 at 3:43 PM, Kelly, Jonathan jonat...@amazon.com
wrote:
spark
% 1.3.0
On Fri, Apr 3, 2015 at 12:45 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
Thanks. So how do I fix it?
ᐧ
On Fri, Apr 3, 2015 at 3:43 PM, Kelly, Jonathan jonat...@amazon.com
wrote:
spark-streaming-kinesis-asl is not part of the Spark distribution on
your cluster, so you
were not included in the assembly
(but yes, they should be).
~ Jonathan Kelly
From: Vadim Bichutskiy vadim.bichuts...@gmail.com
Date: Friday, April 3, 2015 at 12:26 PM
To: Jonathan Kelly jonat...@amazon.com
Cc: user@spark.apache.org user@spark.apache.org
Subject: Re: Spark + Kinesis
Hi
You can start with http://spark.apache.org/docs/1.3.0/index.html
Also get the Learning Spark book http://amzn.to/1NDFI5x. It's great.
Enjoy!
Vadim
ᐧ
On Thu, Apr 2, 2015 at 4:19 AM, Star Guo st...@ceph.me wrote:
Hi, all
I am new to here. Could you give me some suggestion to learn Spark ?
Hi all,
I am trying to write an Amazon Kinesis consumer Scala app that processes
data in the
Kinesis stream. Is this the correct way to specify *build.sbt*:
---
*import AssemblyKeys._*
*name := Kinesis Consumer*
*version := 1.0organization := com.myconsumerscalaVersion :=
=5533377798602752pi=4b4c247b-b7e9-4031-81d5-9b9a8f5f1963
http://polyglotprogramming.com
On Thu, Apr 2, 2015 at 8:33 AM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
You can start with http://spark.apache.org/docs/1.3.0/index.html
Also get the Learning Spark book http://amzn.to/1NDFI5x. It's
for that, and I temporarily moved on to other things for
now.
~ Jonathan Kelly
From: 'Vadim Bichutskiy' vadim.bichuts...@gmail.com
Date: Thursday, April 2, 2015 at 9:53 AM
To: user@spark.apache.org user@spark.apache.org
Subject: Spark + Kinesis
Hi all,
I am trying to write an Amazon
Hi all,
I just tried launching a Spark cluster on EC2 as described in
http://spark.apache.org/docs/1.3.0/ec2-scripts.html
I got the following response:
*ResponseErrorsErrorCodePendingVerification/CodeMessageYour
account is currently being verified. Verification normally takes less than
2
40 matches
Mail list logo