list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
:
Liquan, yes, for full outer join, one hash table on both sides is more
efficient.
For the left/right outer join, it looks like one hash table should be
enought.
--
*From:* Liquan Pei [mailto:liquan...@gmail.com liquan...@gmail.com]
*Sent:* 2014年9月30日 18:34
)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
-
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
= repartRDD.map(...)
var tx2 = tx1.map(...)
while (...) {
tx2 = tx1.zip(tx2).map(...)
}
Is there any way to monitor RDD's lineage, maybe even including? I want to
make sure that there's no unexpected things happening.
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
the partitions get restarted somewhere else, will they retain the
same index value, as well as all the lineage arguments?
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
-- Forwarded message --
From: Liquan Pei liquan...@gmail.com
Date: Thu, Oct 2, 2014 at 3:42 PM
Subject: Re: Spark SQL: ArrayIndexOutofBoundsException
To: SK skrishna...@gmail.com
There is only one place you use index 1. One possible issue is that your
may have only one element
, is the number of concurrent executors per worker capped by the
number of CPU cores configured for the worker?
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
, e-mail: user-h...@spark.apache.org
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
val result = new Array[Double](n)
val bigrams = s.sliding(2).toArray
for (h - bigrams.map(_.hashCode % n)) {
result(h) += 1.0 / bigrams.length
}
Vectors.sparse(n, result.zipWithIndex.filter(_._1 != 0).map(_.swap))
}
--
Liquan Pei
Department of Physics
University
Spark can iterate through the left side and find matches in the
right side from the hash table efficiently. Please comment and suggest,
thanks again!
--
*From:* Liquan Pei [mailto:liquan...@gmail.com]
*Sent:* 2014年9月30日 12:31
*To:* Haopu Wang
*Cc:* d
-tp15429.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Liquan Pei
at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
is, what are the differences between these two methods (other
than the slight differences in their type signatures)? Under what
circumstances should I use one or the other?
Thanks
Dave
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
? or the majority
of them?
Thanks.
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
-- Forwarded message --
From: Liquan Pei liquan...@gmail.com
Date: Mon, Sep 29, 2014 at 2:12 PM
Subject: Re: about partition number
To: anny9699 anny9...@gmail.com
The number of cores available in your cluster determines the number of
tasks that can be run concurrently. If your
using much more partitions than core number?
Anny
On Mon, Sep 29, 2014 at 2:12 PM, Liquan Pei liquan...@gmail.com wrote:
The number of cores available in your cluster determines the number of
tasks that can be run concurrently. If your data is evenly partitioned,
the number of partitions
for every key 2 iterables.
do the contents of these iterables have to fit in memory? or is the data
streamed?
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
the partition is big. And it
doesn't reduce the iteration on streamed relation, right?
Thanks!
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Liquan Pei
-- Forwarded message --
From: Liquan Pei liquan...@gmail.com
Date: Fri, Sep 26, 2014 at 1:33 AM
Subject: Re: Spark SQL question: is cached SchemaRDD storage controlled by
spark.storage.memoryFraction?
To: Haopu Wang hw...@qilinsoft.com
Hi Haopu,
Internally, cactheTable
)
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
should come in the map??
On Wed, Sep 24, 2014 at 10:52 PM, Liquan Pei liquan...@gmail.com wrote:
Hi Deep,
The Iterable trait in scala has methods like map and reduce that you can
use to iterate elements of Iterable[String]. You can also create an
Iterator from the Iterable. For example
For additional commands, e-mail: user-h...@spark.apache.org
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
]? How do we do that?
Because the entire Iterable[String] seems to be a single element in the
RDD.
Thank You
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
(ForkJoinWorkerThread.java:107)
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
implemented as part of MLlib?
Thanks, Oleksiy.
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
()
Is there value in having a persist somewhere here? For example if the
flatMap step is particularly expensive, will it ever be computed twice when
there are no failures?
Thanks
Arun
--
Liquan Pei
Department of Physics
University of Massachusetts Amherst
(DFSInputStream.java:619)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-compute-intensive-tasks-tp9643p9991.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
Liquan Pei
Department of Physics
University
30 matches
Mail list logo