Hi Bharath,
1) Did you sync the spark jar and conf to the worker nodes after build?
2) Since the dataset is not large, could you try local mode first
using `spark-summit --driver-memory 12g --master local[*]`?
3) Try to use less number of partitions, say 5.
If the problem is still there, please
On 07/06/2014 05:19 AM, Nicholas Chammas wrote:
On Fri, Jul 4, 2014 at 3:33 PM, Gurvinder Singh
gurvinder.si...@uninett.no mailto:gurvinder.si...@uninett.no wrote:
csv =
On 5 July 2014 23:08, Mayur Rustagi [via Apache Spark User List]
ml-node+s1001560n8860...@n3.nabble.com wrote:
Key idea is to simulate your app time as you enter data . So you can connect
spark streaming to a queue and insert data in it spaced by time. Easier said
than done :).
I see.
I'll try
If I've created a Spark EC2 cluster, how can I add or take away workers?
Also: If I use EC2 spot instances, what happens when Amazon removes
them? Will my computation be saved in any way, or will I need to
restart from scratch?
Finally: The spark-ec2 scripts seem to use Hadoop 1. How can I
Ah, indeed it looks like I need to install this separately
https://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/wiki/FAQ?redir=1
as it is not part of the core.
Nick
On Sun, Jul 6, 2014 at 2:22 AM, Gurvinder Singh gurvinder.si...@uninett.no
wrote:
On 07/06/2014 05:19 AM,
On Sun, Jul 6, 2014 at 10:10 AM, Robert James srobertja...@gmail.com
wrote:
If I've created a Spark EC2 cluster, how can I add or take away workers?
There is a manual process by which this is possible, but I’m not sure of
the procedure. There is also SPARK-2008
Konstantin,
HWRK provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you can try
from
http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
Let me know if you see issues with the tech preview.
spark PI example on HDP 2.0
I downloaded spark 1.0 pre-build from
Pardon, I was wrong about this. There is actually code distributed
under com.hadoop, and that's where this class is. Oops.
https://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/source/browse/trunk/src/java/com/hadoop/mapreduce/LzoTextInputFormat.java
On Sun, Jul 6, 2014 at 6:37
probably a dumb question, but why is reference equality used for the
indexes?
On Sun, Jul 6, 2014 at 12:43 AM, Ankur Dave ankurd...@gmail.com wrote:
When joining two VertexRDDs with identical indexes, GraphX can use a fast
code path (a zip join without any hash lookups). However, the check
Hello, thanks for your message... I'm confused, Hortonworhs suggest install
spark rpm on each node, but on Spark main page said that yarn enough and I
don't need to install it... What the difference?
sent from my HTC
On Jul 6, 2014 8:34 PM, vs vinayshu...@gmail.com wrote:
Konstantin,
HWRK
Can you provide links to the sections that are confusing?
My understanding, the HDP1 binaries do not need YARN, while the HDP2 binaries
do.
Now, you can also install Hortonworks Spark RPM...
For production, in my opinion, RPMs are better for manageability.
On Jul 6, 2014, at 5:39 PM,
Ni Nick,
The cluster I was working on in those linked messages was a private data
center cluster, not on EC2. I'd imagine that the setup would be pretty
similar, but I'm not familiar with the EC2 init scripts that Spark uses.
Also I upgraded that cluster to 1.0 recently and am continuing to use
Marco,
Hortonworks provides a Tech Preview of Spark 0.9.1 with HDP 2.1 that you
can try
from
http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf
HDP 2.1 means YARN, at the same time they propose ti install rpm
On other hand, http://spark.apache.org/ said
Integrated with
That is confusing based on the context you provided.
This might take more time than I can spare to try to understand.
For sure, you need to add Spark to run it in/on the HDP 2.1 express VM.
Cloudera's CDH 5 express VM includes Spark, but the service isn't running by
default.
I can't
I'm in the process of evaluating Spark to see if it's a fit for my
CPU-intensive application. Many operations in my chain are highly
parallelizable, but some require a minimum number of rows of an input image
in order to operate. Is there a way to give Spark a minimum and/or maximum
size to send
Hi,
We are planning to use spark to load data to Parquet and this data will be
query by Impala for present visualization through Tableau.
Can we achieve this flow? How to load data to Parquet from spark? Will
impala be able to access the data loaded by spark?
I will greatly appreciate if
I can say from my experience that getting Spark to work with Hadoop 2
is not for the beginner; after solving one problem after another
(dependencies, scripts, etc.), I went back to Hadoop 1.
Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
why, but, given so, Hadoop 2 has too
Well, the alternative is to do a deep equality check on the index arrays,
which would be somewhat expensive since these are pretty large arrays (one
element per vertex in the graph). But, in case the reference equality check
fails, it actually might be a good idea to do the deep check before
18 matches
Mail list logo