Re: java server error - spark

2016-06-15 Thread spR
hey, Thanks. Now it worked.. :) On Wed, Jun 15, 2016 at 6:59 PM, Jeff Zhang <zjf...@gmail.com> wrote: > Then the only solution is to increase your driver memory but still > restricted by your machine's memory. "--driver-memory" > > On Thu, Jun 16, 2016 at 9:53 AM,

Re: java server error - spark

2016-06-15 Thread spR
for local > mode, please use other cluster mode. > > On Thu, Jun 16, 2016 at 9:32 AM, Jeff Zhang <zjf...@gmail.com> wrote: > >> Specify --executor-memory in your spark-submit command. >> >> >> >> On Thu, Jun 16, 2016 at 9:01 AM, spR <data.smar

Re: java server error - spark

2016-06-15 Thread spR
;)) sc.conf = conf On Wed, Jun 15, 2016 at 5:59 PM, Jeff Zhang <zjf...@gmail.com> wrote: > >>> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > > > It is OOM on the executor. Please try to increase executor memory. > "--executor-memory"

Re: java server error - spark

2016-06-15 Thread spR
cutor memory. > "--executor-memory" > > > > > > On Thu, Jun 16, 2016 at 8:54 AM, spR <data.smar...@gmail.com> wrote: > >> Hey, >> >> error trace - >> >> hey, >> >> >> error trace - >> >> >> ---

Re: java server error - spark

2016-06-15 Thread spR
;zjf...@gmail.com> wrote: > Could you paste the full stacktrace ? > > On Thu, Jun 16, 2016 at 7:24 AM, spR <data.smar...@gmail.com> wrote: > >> Hi, >> I am getting this error while executing a query using sqlcontext.sql >> >> The table has around 2.5 gb

java server error - spark

2016-06-15 Thread spR
Hi, I am getting this error while executing a query using sqlcontext.sql The table has around 2.5 gb of data to be scanned. First I get out of memory exception. But I have 16 gb of ram Then my notebook dies and I get below error Py4JNetworkError: An error occurred while trying to connect to

Re: concat spark dataframes

2016-06-15 Thread spR
/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> > > > > *From:* Natu Lauchande [mailto:nlaucha...@gmail.com] > *Sent:* Wednesday, June 15, 2016 2:07 PM > *To:* spR > *Cc:* user > *Subject:* Re: concat spark dataframes > > > > Hi, > > You can select the com

data too long

2016-06-15 Thread spR
I am trying to save a spark dataframe in the mysql database by using: df.write(sql_url, table='db.table') the first column in the dataframe seems too long and I get this error : Data too long for column 'custid' at row 1 what should I do? Thanks

concat spark dataframes

2016-06-15 Thread spR
hi, how to concatenate spark dataframes? I have 2 frames with certain columns. I want to get a dataframe with columns from both the other frames. Regards, Misha

Re: processing 50 gb data using just one machine

2016-06-15 Thread spR
e more time. > On Jun 15, 2016 6:43 PM, "spR" <data.smar...@gmail.com> wrote: > >> I have 16 gb ram, i7 >> >> Will this config be able to handle the processing without my ipythin >> notebook dying? >> >> The local mode is for testing pur

Re: processing 50 gb data using just one machine

2016-06-15 Thread spR
15, 2016 at 10:40 PM, Sergio Fernández <wik...@apache.org> > wrote: > >> In theory yes... the common sense say that: >> >> volume / resources = time >> >> So more volume on the same processing resources would just take more time. >> On Jun 15, 2016 6:4

Re: processing 50 gb data using just one machine

2016-06-15 Thread spR
inkedIn: www.linkedin.com/in/deicool > Skype: thumsupdeicool > Google talk: deicool > Blog: http://loveandfearless.wordpress.com > Facebook: http://www.facebook.com/deicool > > "Contribute to the world, environment and more : > http://www.gridrepublic.org > " >

update mysql in spark

2016-06-15 Thread spR
hi, can we write a update query using sqlcontext? sqlContext.sql("update act1 set loc = round(loc,4)") what is wrong in this? I get the following error. Py4JJavaError: An error occurred while calling o20.sql. : java.lang.RuntimeException: [1.1] failure: ``with'' expected but identifier update

processing 50 gb data using just one machine

2016-06-15 Thread spR
Hi, can I use spark in local mode using 4 cores to process 50gb data effeciently? Thank you misha

Re: representing RDF literals as vertex properties

2014-12-08 Thread spr
OK, have waded into implementing this and have gotten pretty far, but am now hitting something I don't understand, an NoSuchMethodError. The code looks like [...] val conf = new SparkConf().setAppName(appName) //conf.set(fs.default.name, file://); val sc = new

representing RDF literals as vertex properties

2014-12-04 Thread spr
@ankurdave's concise code at https://gist.github.com/ankurdave/587eac4d08655d0eebf9, responding to an earlier thread (http://apache-spark-user-list.1001560.n3.nabble.com/How-to-construct-graph-in-graphx-tt16335.html#a16355) shows how to build a graph with multiple edge-types (predicates in

Re: overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2

2014-11-12 Thread spr
After comparing with previous code, I got it work by making the return a Some instead of Tuple2. Perhaps some day I will understand this. spr wrote --code val updateDnsCount = (values: Seq[(Int, Time)], state: Option[(Int, Time)]) = { val currentCount

in function prototypes?

2014-11-11 Thread spr
= DnsSvr.updateStateByKey[(Int, Time)](updateDnsCount) // === error here --compilation output-- [error] /Users/spr/Documents/.../cyberSCinet.scala:513: overloaded method value updateStateByKey with alternatives: [error] (updateFunc: Iterator[((String, String), Seq[(Int, org.apache.spark.streaming.Time

overloaded method value updateStateByKey ... cannot be applied to ... when Key is a Tuple2

2014-11-11 Thread spr
= DnsSvr.updateStateByKey[(Int, Time)](updateDnsCount) // === error here --compilation output-- [error] /Users/spr/Documents/.../cyberSCinet.scala:513: overloaded method value updateStateByKey with alternatives: [error] (updateFunc: Iterator[((String, String), Seq[(Int

Re: with SparkStreeaming spark-submit, don't see output after ssc.start()

2014-11-05 Thread spr
This problem turned out to be a cockpit error. I had the same class name defined in a couple different files, and didn't realize SBT was compiling them all together, and then executing the wrong one. Mea culpa. -- View this message in context:

how to blend a DStream and a broadcast variable?

2014-11-05 Thread spr
My use case has one large data stream (DS1) that obviously maps to a DStream. The processing of DS1 involves filtering it for any of a set of known values, which will change over time, though slowly by streaming standards. If the filter data were static, it seems to obviously map to a broadcast

with SparkStreeaming spark-submit, don't see output after ssc.start()

2014-11-03 Thread spr
I have a Spark Streaming program that works fine if I execute it via sbt runMain com.cray.examples.spark.streaming.cyber.StatefulDhcpServerHisto -f /Users/spr/Documents/.../tmp/ -t 10 but if I start it via $S/bin/spark-submit --master local[12] --class StatefulNewDhcpServers target/scala-2.10

Re: does updateStateByKey accept a state that is a tuple?

2014-10-31 Thread spr
Based on execution on small test cases, it appears that the construction below does what I intend. (Yes, all those Tuple1()s were superfluous.) var lines = ssc.textFileStream(dirArg) var linesArray = lines.map( line = (line.split(\t))) var newState = linesArray.map( lineArray =

does updateStateByKey accept a state that is a tuple?

2014-10-30 Thread spr
= newState.updateStateByKey[(Int, Time, Time)](updateDhcpState) The error I get is [info] Compiling 3 Scala sources to /Users/spr/Documents/.../target/scala-2.10/classes... [error] /Users/spr/Documents/...StatefulDhcpServersHisto.scala:95: value updateStateByKey is not a member

Re: does updateStateByKey accept a state that is a tuple?

2014-10-30 Thread spr
I think I understand how to deal with this, though I don't have all the code working yet. The point is that the V of (K, V) can itself be a tuple. So the updateFunc prototype looks something like val updateDhcpState = (newValues: Seq[Tuple1[(Int, Time, Time)]], state: Option[Tuple1[(Int,

what does DStream.union() do?

2014-10-29 Thread spr
The documentation at https://spark.apache.org/docs/1.0.0/api/scala/index.html#org.apache.spark.streaming.dstream.DStream describes the union() method as Return a new DStream by unifying data of another DStream with this DStream. Can somebody provide a clear definition of what unifying means in

how to extract/combine elements of an Array in DStream element?

2014-10-29 Thread spr
I am processing a log file, from each line of which I want to extract the zeroth and 4th elements (and an integer 1 for counting) into a tuple. I had hoped to be able to index the Array for elements 0 and 4, but Arrays appear not to support vector indexing. I'm not finding a way to extract and

Re: what does DStream.union() do?

2014-10-29 Thread spr
I need more precision to understand. If the elements of one DStream/RDD are (String) and the elements of the other are (Time, Int), what does union mean? I'm hoping for (String, Time, Int) but that appears optimistic. :) Do the elements have to be of homogeneous type? Holden Karau wrote

Re: SparkStreaming program does not start

2014-10-14 Thread spr
Thanks Abraham Jacob, Tobias Pfeiffer, Akhil Das-2, and Sean Owen for your helpful comments. Cockpit error on my part in just putting the .scala file as an argument rather than redirecting stdin from it. -- View this message in context:

SparkStreaming program does not start

2014-10-07 Thread spr
(Point 0) val appName = try1.scala val master = local[5] val conf = new SparkConf().setAppName(appName).setMaster(master) val ssc = new StreamingContext(conf, Seconds(10)) println(Point 1) val lines = ssc.textFileStream(/Users/spr/Documents/big_data/RSA2014/) println(Point 2) println(lines=+lines

Re: SparkStreaming program does not start

2014-10-07 Thread spr
|| Try using spark-submit instead of spark-shell Two questions: - What does spark-submit do differently from spark-shell that makes you think that may be the cause of my difficulty? - When I try spark-submit it complains about Error: Cannot load main class from JAR: file:/Users/spr/.../try1

how to group within the messages at a vertex?

2014-09-17 Thread spr
Sorry if this is in the docs someplace and I'm missing it. I'm trying to implement label propagation in GraphX. The core step of that algorithm is - for each vertex, find the most frequent label among its neighbors and set its label to that. (I think) I see how to get the input from all the

noob: how to extract different members of a VertexRDD

2014-08-19 Thread spr
I'm a Scala / Spark / GraphX newbie, so may be missing something obvious. I have a set of edges that I read into a graph. For an iterative community-detection algorithm, I want to assign each vertex to a community with the name of the vertex. Intuitively it seems like I should be able to pull

Re: noob: how to extract different members of a VertexRDD

2014-08-19 Thread spr
ankurdave wrote val g = ... val newG = g.mapVertices((id, attr) = id) // newG.vertices has type VertexRDD[VertexId], or RDD[(VertexId, VertexId)] Yes, that worked perfectly. Thanks much. One follow-up question. If I just wanted to get those values into a vanilla variable (not a