Same here, got stuck at this point. Any hints on what might be going on?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster-using-spark-0-9-0-tp1297p6463.html
Sent from the Apache Spark User List mailing list archive
Hi all,
I am facing issues while using spark with HBase. I am getting
NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
(TableName.java:288)
Can someone please help to resolve this issue. What am I missing ?
I am using following snippet of code -
Configuration config =
Hi,
I wanted to calculate the InterClusterDensity and IntraClusterDensity from the
clusters generated from KMeans.
How can I achieve that? Is there any already present code/api to use for this
purpose.
Thanks
Stuti Awasthi
::DISCLAIMER::
I've been playing with the amplab docker scripts and I needed to set
spark.driver.host to the driver host ip. One that all spark processes can get
to.
On May 28, 2014, at 4:35 AM, jaranda jordi.ara...@bsc.es wrote:
Same here, got stuck at this point. Any hints on what might be going on?
It's not possible currently to write anything other than text (or pickle
files I think in 1.0.0 or if not then in 1.0.1) from PySpark.
I have an outstanding pull request to add READING any InputFormat from
PySpark, and after that is in I will look into OutputFormat too.
What does your data look
Hi,
I have a bunch of files that are bz2 compressed but do not have the
extension .bz2
Is there anyway to force spark to read them as bz2 files using sc.textFile ?
FYI, if i add the .bz2 extension to the file it works fine but the process
that creates those files can't do that and i'd like to
You can use Hadoop APi provide input/output reader hadoop configuration
file to read the data.
Regards
Mayur
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Wed, May 28, 2014 at 7:22 PM, Laurent T
Any one who has used spark this way or has faced similar issue, please help.
Thanks,
-Vibhor
On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga vibhorba...@gmail.com wrote:
Hi all,
I am facing issues while using spark with HBase. I am getting
NullPointerException at
Hi Ankur,
We’ve built it from the git link you’ve sent, and we don’t get the exception
anymore.
However, we’ve been facing strange indeterministic behavior from Graphx.
We compute connected components on a graph of ~900K edges. We ran the spark job
several times on the same input graph and got
Howdy Andrew,
Here is what I ran before an application context was created (other
services have been deleted):
# netstat -l -t tcp -p --numeric-ports
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
Wisely, is mapToPair in Spark 0.9.1 or 1.0? I'm running the former and
didn't see that method available.
I think the issue is that predict() is expecting an RDD containing a tuple
of ints and not Integers. So if I use JavaPairRDDObject,Object with my
original code snippet, things seem to at least
Hi all,
I have installed Apache Shark 0.9.1 on my machine which comes bundled with
hive-0.11 version of hive jars.I am trying to integrate this with my
pre-existing CDH-4.6 version of the Hive server which is of version 0.10.On
pointing HIVE_HOME in spark-env.sh to the cloudera version of the
Mohit Jaggi:
A workaround is to use zipWithIndex (to appear in Spark 1.0, but if you're
still on 0.9x you can swipe the code from
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala
), map it to (x = (x._2,x._1)) and then sortByKey.
Hi,
I'm new to Spark and Hadoop, and I'd like to know if the following
problem is solvable in terms of Spark's primitives.
To compute the K-nearest neighbours of a N-dimensional dataset, I can
multiply my very large normalized sparse matrix by its transpose. As
this yields all pairwise distance
OK...I needed to set the JVM class.path for the worker to find the fb class:
env.put(SPARK_JAVA_OPTS,
-Djava.class.path=/home/myInc/hive-0.9.0-bin/lib/libfb303.jar);
Now I am seeing the following spark.httpBroadcast.uri error. What am I
missing?
java.util.NoSuchElementException:
On Tue, May 27, 2014 at 6:08 PM, JaeBoo Jung itsjb.j...@samsung.com wrote:
I already tried HiveContext as well as SqlContext.
But it seems that Spark's HiveContext is not completely same as Apache
Hive.
For example, SQL like 'SELECT RANK() OVER(ORDER BY VAL1 ASC) FROM TEST
LIMIT 10' works
During the last few days I've been trying to deploy a Scala job to a
standalone cluster (master + 4 workers) without much success, although it
worked perfectly when launching it from the spark shell, that is, using the
Scala REPL (pretty strange, this would mean my cluster config was actually
Thank you for your answer. Would you have by any chance some example
code (even fragmentary) that I could study?
On 28 May 2014 14:04, Tom Vacek minnesota...@gmail.com wrote:
Maybe I should add: if you can hold the entire matrix in memory, then this
is embarrassingly parallel. If not, then the
Remark, just including the jar built by sbt will produce the same
error. i,.e this pig script will fail:
REGISTER
/usr/share/osi1/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop0.20.2-cdh3u4.jar;
edgeList0 = LOAD
On 5/27/2014 1:28 PM, Marcelo Vanzin wrote:
On Tue, May 27, 2014 at 1:05 PM, Suman Somasundar
suman.somasun...@oracle.com wrote:
I am running this on a Solaris machine with logical partitions. All the
partitions (workers) access the same Spark folder.
Can you check whether you have multiple
posted a JIRA https://issues.apache.org/jira/browse/SPARK-1952
On Wed, May 28, 2014 at 1:14 PM, Ryan Compton compton.r...@gmail.com wrote:
Remark, just including the jar built by sbt will produce the same
error. i,.e this pig script will fail:
REGISTER
Thanks! Sounds like my rough understanding was roughly right :)
Definitely understand cached RDDs can add to the memory requirements.
Luckily, like you mentioned, you can configure spark to flush that to disk
and bound its total size in memory via spark.storage.memoryFraction, so I
have a
I've been trying to reproduce this but I haven't succeeded so far. For
example, on the web-Google
https://snap.stanford.edu/data/web-Google.htmlgraph, I get the
expected results both on v0.9.1-handle-empty-partitions
and on master:
// Load web-Google and run connected componentsimport
Hi Nick,
I finally got around to downloading and building the patch.
I pulled the code from
https://github.com/MLnick/spark-1/tree/pyspark-inputformats
I am running on a CDH5 node. While the code in the CDH branch is different
from spark master, I do believe that I have resolved any
It sounds like you made a typo in the code — perhaps you’re trying to call
self._jvm.PythonRDDnewAPIHadoopFile instead of
self._jvm.PythonRDD.newAPIHadoopFile? There should be a dot before the new.
Matei
On May 28, 2014, at 5:25 PM, twizansk twiza...@gmail.com wrote:
Hi Nick,
I finally
You can remove cached RDDs by calling unpersist() on them.
You can also use SparkContext.getRDDStorageInfo to get info on cache usage,
though this is a developer API so it may change in future versions. We will add
a standard API eventually but this is just very closely tied to framework
In my code I am not referencing PythonRDD or PythonRDDnewAPIHadoopFile at
all. I am calling SparkContext.newAPIHadoopFile with:
inputformat_class='org.apache.hadoop.hbase.mapreduce.TableInputFormat'
key_class='org.apache.hadoop.hbase.io.ImmutableBytesWritable',
The code which causes the error is:
The code which causes the error is:
sc = SparkContext(local, My App)
rdd = sc.newAPIHadoopFile(
name,
'org.apache.hadoop.hbase.mapreduce.TableInputFormat',
'org.apache.hadoop.hbase.io.ImmutableBytesWritable',
I've been trying for several days now to get a Spark application running in
stand-alone mode, as described here:
http://spark.apache.org/docs/latest/spark-standalone.html
I'm using pyspark, so I've been following the example here:
Hi Sid,
We are successfully running Spark on an HPC, it works great. Here's info on our
setup / approach.
We have a cluster with 256 nodes running Scientific Linux 6.3 and scheduled by
Univa Grid Engine. The environment also has a DDN GridScalar running GPFS and
several EMC Isilon clusters
Hi,
My shark-env.sh is already pointing to the hadoop2 cluster:
export
HADOOP_HOME=/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop
Both the hadoop cluster as well as the embedded hadoop jars within Shark
are of version 2.0.0.
Any more suggestions please?
Thanks
On Wed, May 28,
31 matches
Mail list logo