broadcast: OutOfMemoryError

2014-12-11 Thread ll
hi.  i'm running into this OutOfMemory issue when i'm broadcasting a large
array.  what is the best way to handle this?

should i split the array into smaller arrays before broadcasting, and then
combining them locally at each node?

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-OutOfMemoryError-tp20633.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: RDD.aggregate?

2014-12-11 Thread ll
any explaination on how aggregate works would be much appreciated.  i already
looked at the spark example and still am confused about the seqop and
combop... thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-aggregate-tp20434p20634.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: what is the best way to implement mini batches?

2014-12-11 Thread ll
any advice/comment on this would be much appreciated.  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/what-is-the-best-way-to-implement-mini-batches-tp20264p20635.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



why is spark + scala code so slow, compared to python?

2014-12-11 Thread ll
hi.. i'm converting some of my machine learning python code into scala +
spark.  i haven't been able to run it on large dataset yet, but on small
datasets (like http://yann.lecun.com/exdb/mnist/), my spark + scala code is
much slower than my python code (5 to 10 times slower than python)

i already tried everything to improve my spark + scala code like
broadcasting variables, caching the RDD, replacing all my matrix/vector
operations with breeze/blas, etc.  i saw some improvements, but it's still a
lot slower than my python code.

why is that?  

how do you improve your spark + scala performance today?  

or is spark + scala just not the right tool for small to medium datasets?

when would you use spark + scala vs. python?

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/why-is-spark-scala-code-so-slow-compared-to-python-tp20636.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RDD.aggregate?

2014-12-04 Thread ll
can someone please explain how RDD.aggregate works?  i looked at the average
example done with aggregate() but i'm still confused about this function...
much appreciated.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-aggregate-tp20434.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



where is the org.apache.spark.util package?

2014-11-07 Thread ll
i'm trying to compile some of the spark code directly from the source
(https://github.com/apache/spark).  it complains about the missing package
org.apache.spark.util.  it doesn't look like this package is part of the
source code on github. 

where can i find this package?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/where-is-the-org-apache-spark-util-package-tp18360.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: where is the org.apache.spark.util package?

2014-11-07 Thread ll
i found util package under spark core package, but i now got this error
Sysmbol Utils is inaccessible from this place.  

what does this error mean?

the org.apache.spark.util and org.apache.spark.spark.Utils are there now.

thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/where-is-the-org-apache-spark-util-package-tp18360p18361.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Fwd: Why is Spark not using all cores on a single machine?

2014-11-07 Thread ll
hi.  i did use local[8] as below, but it still ran on only 1 core.

val sc = new SparkContext(new
SparkConf().setMaster(local[8]).setAppName(abc))

any advice is much appreciated.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Why-is-Spark-not-using-all-cores-on-a-single-machine-tp1638p18397.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



word2vec: how to save an mllib model and reload it?

2014-11-06 Thread ll
what is the best way to save an mllib model that you just trained and reload
it in the future?  specifically, i'm using the mllib word2vec model...
thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/word2vec-how-to-save-an-mllib-model-and-reload-it-tp18329.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Matrix multiplication in spark

2014-11-05 Thread ll
@sowen.. i am looking for distributed operations, especially very large
sparse matrix x sparse matrix multiplication.  what is the best way to
implement this in spark?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-multiplication-in-spark-tp12562p18164.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



sparse x sparse matrix multiplication

2014-11-04 Thread ll
what is the best way to implement a sparse x sparse matrix multiplication
with spark?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/sparse-x-sparse-matrix-multiplication-tp18163.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



is spark a good fit for sequential machine learning algorithms?

2014-11-03 Thread ll
i'm struggling with implementing a few algorithms with spark.  hope to get
help from the community.

most of the machine learning algorithms today are sequential, while spark
is all about parallelism.  it seems to me that using spark doesn't
actually help much, because in most cases you can't really paralellize a
sequential algorithm.

there must be some strong reasons why mllib was created and so many people
claim spark is ideal for machine learning.

what are those reasons?  

what are some specific examples when  how to use spark to implement
sequential machine learning algorithms?

any commen/feedback/answer is much appreciated.

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/is-spark-a-good-fit-for-sequential-machine-learning-algorithms-tp18000.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



SparkContext.stop() ?

2014-10-31 Thread ll
what is it for?  when do we call it?

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-stop-tp17826.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



real-time streaming

2014-10-28 Thread ll
the spark tutorial shows that we can create a stream that reads new files
from a directory.  

that seems to have some lag time, as we have to write the data to file first
and then wait until spark stream picks it up.

what is the best way to implement REAL 'REAL-TIME' streaming for analysis in
real time?  for example, like streaming videos, sounds, images, etc
continuously?

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/real-time-streaming-tp17526.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: real-time streaming

2014-10-28 Thread ll
thanks jay.  do you think spark is a good fit for handling streaming 
analyzing videos in real time?  in this case, we're streaming 30 frames per
second, and each frame is an image (size:  roughly 500K - 1MB).  we need to
analyze every frame and return the analysis result back instantly in real
time.  thanks again.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/real-time-streaming-tp17526p17528.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



complexity of each action / transformation

2014-10-17 Thread ll
hello... is there a list that shows the complexity of each
action/transformation?  for example, what is the complexity of
RDD.map()/filter() or RowMatrix.multiply() etc?  that would be really
helpful.

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/complexity-of-each-action-transformation-tp16705.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



mllib.linalg.Vectors vs Breeze?

2014-10-17 Thread ll
hello... i'm looking at the source code for mllib.linalg.Vectors and it looks
like it's a wrapper around Breeze with very small changes (mostly changing
the names).

i don't have any problem with using spark wrapper around Breeze or Breeze
directly.  i'm just curious to understand why this wrapper was created vs.
pointing everyone to Breeze directly?

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mllib-linalg-Vectors-vs-Breeze-tp16722.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



reverse an rdd

2014-10-16 Thread ll
hello... what is the best way to iterate through an rdd backward (last
element first, first element last)?  thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/reverse-an-rdd-tp16602.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



scala: java.net.BindException?

2014-10-16 Thread ll
hello... does anyone know how to resolve this issue?  i'm running this
locally on my computer.  keep getting this BindException.  much appreciated.

14/10/16 17:48:13 WARN component.AbstractLifeCycle: FAILED
SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already
in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at
org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at
org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at
org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:192)
at org.apache.spark.ui.JettyUtils$$anonfun$3.apply(JettyUtils.scala:202)
at org.apache.spark.ui.JettyUtils$$anonfun$3.apply(JettyUtils.scala:202)
at
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1446)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1442)
at 
org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:202)
at org.apache.spark.ui.WebUI.bind(WebUI.scala:102)
at org.apache.spark.SparkContext.init(SparkContext.scala:224)
at
nn.SimpleNeuralNetwork$delayedInit$body.apply(SimpleNeuralNetwork.scala:15)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at 
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/scala-java-net-BindException-tp16624.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



object in an rdd: serializable?

2014-10-16 Thread ll
i got an exception complaining about serializable.  the sample code is
below...

class HelloWorld(val count: Int) {
  ...
  ...
}

object Test extends App {
  ...
  val data = sc.parallelize(List(new HelloWorld(1), new HelloWorld(2))) 
  ... 
}

what is the best way to serialize HelloWorld so that it can be contained in
an RDD?

thanks!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/object-in-an-rdd-serializable-tp16638.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



matrix operations?

2014-10-15 Thread ll
hi there... is there any other matrix operations in addition to multiply()? 
like addition or dot product?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/matrix-operations-tp16508.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RowMatrix.multiply() ?

2014-10-15 Thread ll
hi.. it looks like RowMatrix.multiply() takes a local Matrix as a parameter
and returns the result as a distributed RowMatrix.  

how do you perform this series of multiplications if A, B, C, and D are all
RowMatrix?

((A x B) x C) x D)

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RowMatrix-multiply-tp16509.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: graphx - mutable?

2014-10-14 Thread ll
hi again.  just want to check in again to see if anyone could advise on how
to implement a mutable, growing graph with graphx?  

we're building a graph is growing over time.  it adds more vertices and
edges every iteration of our algorithm.

it doesn't look like there is an obvious way to add a new vertice  a set of
edges to an existing graph.

what would be the best way to implement this with graphx?

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/graphx-mutable-tp15777p16409.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



mllib CoordinateMatrix

2014-10-14 Thread ll
after creating a coordinate matrix from my rdd[matrixentry]... 

1.  how can i get/query the value at coordiate (i, j)?

2.  how can i set/update the value at coordiate (i, j)?

3.  how can i get all the values on a specific row i, ideally as a vector?

thanks!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mllib-CoordinateMatrix-tp16412.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



graphx - mutable?

2014-10-05 Thread ll
i understand that graphx is an immutable rdd.

i'm working on an algorithm that requires a mutable graph.  initially, the
graph starts with just a few nodes and edges.  then over time, it adds more
and more nodes and edges.   

what would be the best way to implement this growing graph with graphx?

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/graphx-mutable-tp15777.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: android + spark streaming?

2014-10-04 Thread ll
any comment/feedback/advice on this is much appreciated!  thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/android-spark-streaming-tp15661p15735.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



scala Vector vs mllib Vector

2014-10-04 Thread ll
what are the pros/cons of each?  when should we use mllib Vector, and when to
use standard scala Vector?  thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/scala-Vector-vs-mllib-Vector-tp15736.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: scala Vector vs mllib Vector

2014-10-04 Thread ll
thanks dean.  thanks for the answer with great clarity!  

i'm working on an algorithm that has a weight vector W(w0, w1, .., wN).  the
elements of this weight vector are adjusted/updated frequently - every
iteration of the algorithm.  how would you recommend to implement this
vector?  what is the best practice to implement this in Scala  Spark?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/scala-Vector-vs-mllib-Vector-tp15736p15741.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org