hey,
Thanks. Now it worked.. :)
On Wed, Jun 15, 2016 at 6:59 PM, Jeff Zhang <zjf...@gmail.com> wrote:
> Then the only solution is to increase your driver memory but still
> restricted by your machine's memory. "--driver-memory"
>
> On Thu, Jun 16, 2016 at 9:53 AM,
for local
> mode, please use other cluster mode.
>
> On Thu, Jun 16, 2016 at 9:32 AM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> Specify --executor-memory in your spark-submit command.
>>
>>
>>
>> On Thu, Jun 16, 2016 at 9:01 AM, spR <data.smar
;))
sc.conf = conf
On Wed, Jun 15, 2016 at 5:59 PM, Jeff Zhang <zjf...@gmail.com> wrote:
> >>> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>
> It is OOM on the executor. Please try to increase executor memory.
> "--executor-memory"
cutor memory.
> "--executor-memory"
>
>
>
>
>
> On Thu, Jun 16, 2016 at 8:54 AM, spR <data.smar...@gmail.com> wrote:
>
>> Hey,
>>
>> error trace -
>>
>> hey,
>>
>>
>> error trace -
>>
>>
>> ---
;zjf...@gmail.com> wrote:
> Could you paste the full stacktrace ?
>
> On Thu, Jun 16, 2016 at 7:24 AM, spR <data.smar...@gmail.com> wrote:
>
>> Hi,
>> I am getting this error while executing a query using sqlcontext.sql
>>
>> The table has around 2.5 gb
Hi,
I am getting this error while executing a query using sqlcontext.sql
The table has around 2.5 gb of data to be scanned.
First I get out of memory exception. But I have 16 gb of ram
Then my notebook dies and I get below error
Py4JNetworkError: An error occurred while trying to connect to
/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Natu Lauchande [mailto:nlaucha...@gmail.com]
> *Sent:* Wednesday, June 15, 2016 2:07 PM
> *To:* spR
> *Cc:* user
> *Subject:* Re: concat spark dataframes
>
>
>
> Hi,
>
> You can select the com
I am trying to save a spark dataframe in the mysql database by using:
df.write(sql_url, table='db.table')
the first column in the dataframe seems too long and I get this error :
Data too long for column 'custid' at row 1
what should I do?
Thanks
hi,
how to concatenate spark dataframes? I have 2 frames with certain columns.
I want to get a dataframe with columns from both the other frames.
Regards,
Misha
e more time.
> On Jun 15, 2016 6:43 PM, "spR" <data.smar...@gmail.com> wrote:
>
>> I have 16 gb ram, i7
>>
>> Will this config be able to handle the processing without my ipythin
>> notebook dying?
>>
>> The local mode is for testing pur
15, 2016 at 10:40 PM, Sergio Fernández <wik...@apache.org>
> wrote:
>
>> In theory yes... the common sense say that:
>>
>> volume / resources = time
>>
>> So more volume on the same processing resources would just take more time.
>> On Jun 15, 2016 6:4
inkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>
hi,
can we write a update query using sqlcontext?
sqlContext.sql("update act1 set loc = round(loc,4)")
what is wrong in this? I get the following error.
Py4JJavaError: An error occurred while calling o20.sql.
: java.lang.RuntimeException: [1.1] failure: ``with'' expected but
identifier update
Hi,
can I use spark in local mode using 4 cores to process 50gb data
effeciently?
Thank you
misha
OK, have waded into implementing this and have gotten pretty far, but am now
hitting something I don't understand, an NoSuchMethodError.
The code looks like
[...]
val conf = new SparkConf().setAppName(appName)
//conf.set(fs.default.name, file://);
val sc = new
@ankurdave's concise code at
https://gist.github.com/ankurdave/587eac4d08655d0eebf9, responding to an
earlier thread
(http://apache-spark-user-list.1001560.n3.nabble.com/How-to-construct-graph-in-graphx-tt16335.html#a16355)
shows how to build a graph with multiple edge-types (predicates in
After comparing with previous code, I got it work by making the return a Some
instead of Tuple2. Perhaps some day I will understand this.
spr wrote
--code
val updateDnsCount = (values: Seq[(Int, Time)], state: Option[(Int,
Time)]) = {
val currentCount
= DnsSvr.updateStateByKey[(Int, Time)](updateDnsCount)
// === error here
--compilation output--
[error] /Users/spr/Documents/.../cyberSCinet.scala:513: overloaded method
value updateStateByKey with alternatives:
[error] (updateFunc: Iterator[((String, String), Seq[(Int,
org.apache.spark.streaming.Time
= DnsSvr.updateStateByKey[(Int, Time)](updateDnsCount)
// === error here
--compilation output--
[error] /Users/spr/Documents/.../cyberSCinet.scala:513: overloaded method
value updateStateByKey with alternatives:
[error] (updateFunc: Iterator[((String, String), Seq[(Int
This problem turned out to be a cockpit error. I had the same class name
defined in a couple different files, and didn't realize SBT was compiling
them all together, and then executing the wrong one. Mea culpa.
--
View this message in context:
My use case has one large data stream (DS1) that obviously maps to a DStream.
The processing of DS1 involves filtering it for any of a set of known
values, which will change over time, though slowly by streaming standards.
If the filter data were static, it seems to obviously map to a broadcast
I have a Spark Streaming program that works fine if I execute it via
sbt runMain com.cray.examples.spark.streaming.cyber.StatefulDhcpServerHisto
-f /Users/spr/Documents/.../tmp/ -t 10
but if I start it via
$S/bin/spark-submit --master local[12] --class StatefulNewDhcpServers
target/scala-2.10
Based on execution on small test cases, it appears that the construction
below does what I intend. (Yes, all those Tuple1()s were superfluous.)
var lines = ssc.textFileStream(dirArg)
var linesArray = lines.map( line = (line.split(\t)))
var newState = linesArray.map( lineArray =
= newState.updateStateByKey[(Int, Time,
Time)](updateDhcpState)
The error I get is
[info] Compiling 3 Scala sources to
/Users/spr/Documents/.../target/scala-2.10/classes...
[error] /Users/spr/Documents/...StatefulDhcpServersHisto.scala:95: value
updateStateByKey is not a member
I think I understand how to deal with this, though I don't have all the code
working yet. The point is that the V of (K, V) can itself be a tuple. So
the updateFunc prototype looks something like
val updateDhcpState = (newValues: Seq[Tuple1[(Int, Time, Time)]], state:
Option[Tuple1[(Int,
The documentation at
https://spark.apache.org/docs/1.0.0/api/scala/index.html#org.apache.spark.streaming.dstream.DStream
describes the union() method as
Return a new DStream by unifying data of another DStream with this
DStream.
Can somebody provide a clear definition of what unifying means in
I am processing a log file, from each line of which I want to extract the
zeroth and 4th elements (and an integer 1 for counting) into a tuple. I had
hoped to be able to index the Array for elements 0 and 4, but Arrays appear
not to support vector indexing. I'm not finding a way to extract and
I need more precision to understand. If the elements of one DStream/RDD are
(String) and the elements of the other are (Time, Int), what does union
mean? I'm hoping for (String, Time, Int) but that appears optimistic. :)
Do the elements have to be of homogeneous type?
Holden Karau wrote
Thanks Abraham Jacob, Tobias Pfeiffer, Akhil Das-2, and Sean Owen for your
helpful comments. Cockpit error on my part in just putting the .scala file
as an argument rather than redirecting stdin from it.
--
View this message in context:
(Point 0)
val appName = try1.scala
val master = local[5]
val conf = new SparkConf().setAppName(appName).setMaster(master)
val ssc = new StreamingContext(conf, Seconds(10))
println(Point 1)
val lines = ssc.textFileStream(/Users/spr/Documents/big_data/RSA2014/)
println(Point 2)
println(lines=+lines
|| Try using spark-submit instead of spark-shell
Two questions:
- What does spark-submit do differently from spark-shell that makes you
think that may be the cause of my difficulty?
- When I try spark-submit it complains about Error: Cannot load main class
from JAR: file:/Users/spr/.../try1
Sorry if this is in the docs someplace and I'm missing it.
I'm trying to implement label propagation in GraphX. The core step of that
algorithm is
- for each vertex, find the most frequent label among its neighbors and set
its label to that.
(I think) I see how to get the input from all the
I'm a Scala / Spark / GraphX newbie, so may be missing something obvious.
I have a set of edges that I read into a graph. For an iterative
community-detection algorithm, I want to assign each vertex to a community
with the name of the vertex. Intuitively it seems like I should be able to
pull
ankurdave wrote
val g = ...
val newG = g.mapVertices((id, attr) = id)
// newG.vertices has type VertexRDD[VertexId], or RDD[(VertexId,
VertexId)]
Yes, that worked perfectly. Thanks much.
One follow-up question. If I just wanted to get those values into a vanilla
variable (not a
34 matches
Mail list logo