you should import either spark.implicits or sqlContext.implicits, not both.
Otherwise the compiler will be confused about two implicit transformations
following code works for me, spark version 2.1.0
object Test {
def main(args: Array[String]) {
val spark = SparkSession
.builder
no you don't need one hot. but since the feature column is a vector and
vector only accepts numbers, if your feature is string then a StringIndexer
is needed.
http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier
here's an example.
On Thu, Mar 23, 2017 at
could you give the event timeline and dag for the time consuming stages on
spark UI?
On Thu, Mar 23, 2017 at 4:30 AM, Matt Deaver wrote:
> For various reasons, our data set is partitioned in Spark by customer id
> and saved to S3. When trying to read this data, however,
spark version 2.1.0, vector is from ml package. the Vector in mllib has a
public VectorUDT type
On Thu, Mar 30, 2017 at 10:57 AM, Ryan <ryan.hd@gmail.com> wrote:
> I'm writing a transformer and the input column is vector type(which is the
> output column from other
I'm writing a transformer and the input column is vector type(which is the
output column from other transformer). But as the VectorUDT is private, how
could I check/transform schema for the vector column?
and could you paste the stage and task information from SparkUI
On Wed, Mar 29, 2017 at 11:30 AM, Ryan <ryan.hd@gmail.com> wrote:
> how long does it take if you remove the repartition and just collect the
> result? I don't think repartition is needed here. There's alrea
how long does it take if you remove the repartition and just collect the
result? I don't think repartition is needed here. There's already a shuffle
for group by
On Tue, Mar 28, 2017 at 10:35 PM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> Hi,
>
> I am working on requirement where i
foreachPartition is an action but run on each worker, which means you won't
see anything on driver.
mapPartitions is a transformation which is lazy and won't do anything until
an action.
it depends on the specific use case which is better. To output sth(like a
print in single machine) you could
you could write a udf using the asML method along with some type casting,
then apply the udf to data after pca.
when using pipeline, that udf need to be wrapped in a customized
transformer, I think.
On Sun, Apr 9, 2017 at 10:07 PM, Nick Pentreath
wrote:
> Why not use
row group wouldn't
be read if the predicate isn't satisfied due to index.
2. It is absolutely true the performance gain depends on the id
distribution...
On Mon, Apr 17, 2017 at 4:23 PM, 莫涛 <mo...@sensetime.com> wrote:
> Hi Ryan,
>
>
> The attachment is a screen shot for the sp
you can build a search tree using ids within each partition to act like an
index, or create a bloom filter to see if current partition would have any
hit.
What's your expected qps and response time for the filter request?
On Mon, Apr 17, 2017 at 2:23 PM, MoTao wrote:
> Hi
I don't think you can parallel insert into a hive table without dynamic
partition, for hive locking please refer to
https://cwiki.apache.org/confluence/display/Hive/Locking.
Other than that, it should work.
On Mon, Apr 17, 2017 at 6:52 AM, Amol Patil wrote:
> Hi All,
>
>
<mo...@sensetime.com> wrote:
> Hi Ryan,
>
>
> 1. "expected qps and response time for the filter request"
>
> I expect that only the requested BINARY are scanned instead of all
> records, so the response time would be "10K * 5MB / disk read speed", or
>
searching I
find sequence file might be a comparator of har you may interested with.
Thanks for all people involved. I've learnt a lot too :-)
On Thu, Apr 20, 2017 at 5:25 PM, 莫涛 <mo...@sensetime.com> wrote:
> Hi Ryan,
>
>
> The attachment is the event timeline on executors. Th
It shouldn't be a problem then. We've done the similar thing in scala. I
don't have much experience with python thread but maybe the code related
with reading/writing temp table isn't thread safe.
On Mon, Apr 17, 2017 at 9:45 PM, Amol Patil <amol4soc...@gmail.com> wrote:
> Thanks Ryan,
the thrift server is a jdbc server, Kanth
On Fri, Aug 11, 2017 at 2:51 PM, wrote:
> I also wonder why there isn't a jdbc connector for spark sql?
>
> Sent from my iPhone
>
> On Aug 10, 2017, at 2:45 PM, Jules Damji wrote:
>
> Yes, it's more used in Hive
you could exit with error code just like normal java/scala application, and
get it from driver/yarn
On Fri, Aug 11, 2017 at 9:55 AM, Wei Zhang
wrote:
> I suppose you can find the job status from Yarn UI application view.
>
>
>
> Cheers,
>
> -z
>
>
>
> *From:* 陈宇航
Not sure about the writer api, but you could always register a temp table
for that dataframe and execute insert hql.
On Thu, Jul 20, 2017 at 6:13 AM, ctang wrote:
> I wonder if there are any easy ways (or APIs) to insert a dataframe (or
> DataSet), which does not contain the
I think it creates a new connection on each worker, whenever the Processor
references Resource, it got initialized.
There's no need for the driver connect to the db in this case.
On Thu, Jun 29, 2017 at 5:52 PM, salvador wrote:
> Hi all,
>
> I am writing a spark job from
It's just sort of inner join operation... If the second dataset isn't very
large it's ok(btw, you can use flatMap directly instead of map followed by
flatmap/flattern), otherwise you can register the second one as a
rdd/dataset, and join them on user id.
On Wed, Aug 9, 2017 at 4:29 PM,
a.com> wrote:
>
>> Riccardo and Ryan
>>Thank you for your ideas.It seems that crossjoin is a new dataset api
>> after spark2.x.
>> my spark version is 1.6.3. Is there a relative api to do crossjoin?
>> thank you.
>>
>>
>>
>>
Why would you like to do so? I think there's no need for us to explicitly
ask for a forEachPartition in spark sql because tungsten is smart enough to
figure out whether a sql operation could be applied on each partition or
there has to be a shuffle.
On Sun, Jun 25, 2017 at 11:32 PM, jeff saremi
s still germane.
>
> 2017-06-25 19:18 GMT-07:00 Ryan <ryan.hd@gmail.com>:
>
>> Why would you like to do so? I think there's no need for us to explicitly
>> ask for a forEachPartition in spark sql because tungsten is smart enough to
>> figure out whether a
> private static Broadcast bcv;
> public static void setBCV(Broadcast setbcv){ bcv = setbcv; }
> public static Integer getBCV()
> {
> return bcv.value();
> }
> }
>
>
> On Fri, Jun 16, 2017 at 3:35 AM, Ryan <ryan.hd@gmail.com> wrote:
>
>>
ng it record by
> record. mapPartitions() give us the ability to invoke this in bulk. We're
> looking for a similar approach in SQL.
>
>
> --
> *From:* Ryan <ryan.hd@gmail.com>
> *Sent:* Sunday, June 25, 2017 7:18:32 PM
> *To:* jeff saremi
>
if you use StringIndexer to category the data, IndexToString could convert
it back.
On Wed, Jun 7, 2017 at 6:14 PM, kundan kumar wrote:
> Hi Yan,
>
> This doesnt work.
>
> thanks,
> kundan
>
> On Wed, Jun 7, 2017 at 2:53 PM, 颜发才(Yan Facai)
> wrote:
>
I'd suggest scripts like js, groovy, etc.. To my understanding the service
loader mechanism isn't a good fit for runtime reloading.
On Wed, Jun 7, 2017 at 4:55 PM, Jonnas Li(Contractor) <
zhongshuang...@envisioncn.com> wrote:
> To be more explicit, I used mapwithState() in my application, just
1. could you give job, stage & task status from Spark UI? I found it
extremely useful for performance tuning.
2. use modele.transform for predictions. Usually we have a pipeline for
preparing training data, and use the same pipeline to transform data you
want to predict could give us the
we use AsyncHttpClient(from the java world) and simply call future.get as
synchronous call.
On Thu, Jun 1, 2017 at 4:08 AM, vimal dinakaran wrote:
> Hi,
> In our application pipeline we need to push the data from spark streaming
> to a http server.
>
> I would like to have
I think you need to get the logger within the lambda, otherwise it's the
logger on driver side which can't work.
On Wed, May 31, 2017 at 4:48 PM, Paolo Patierno wrote:
> No it's running in standalone mode as Docker image on Kubernetes.
>
>
> The only way I found was to
did you include the proper scala-reflect dependency?
On Wed, May 31, 2017 at 1:01 AM, krishmah wrote:
> I am currently using Spark 2.0.1 with Scala 2.11.8. However same code works
> with Scala 2.10.6. Please advise if I am missing something
>
> import
I don't think Broadcast itself can be serialized. you can get the value out
on the driver side and refer to it in foreach, then the value would be
serialized with the lambda expr and sent to workers.
On Fri, Jun 16, 2017 at 2:29 AM, Anton Kravchenko <
kravchenko.anto...@gmail.com> wrote:
> How
I don't think ss now support "partitioned" watermark. and why different
partition's consumption rate vary? If the handling logic is quite
different, using different topic is a better way.
On Fri, Sep 1, 2017 at 4:59 PM, 张万新 wrote:
> Thanks, it's true that looser
you may try foreachPartition
On Fri, Sep 1, 2017 at 10:54 PM, asethia wrote:
> Hi,
>
> I have list of person records in following format:
>
> case class Person(fName:String, city:String)
>
> val l=List(Person("A","City1"),Person("B","City2"),Person("C","City1"))
>
> val
No idea how feasible this is. Has anyone done it?
To clarify: I don't need the actual paths, just the distances.
On Wed, Mar 26, 2014 at 3:04 PM, Ryan Compton compton.r...@gmail.com wrote:
No idea how feasible this is. Has anyone done it?
Does this continue in newer versions? (I'm on 0.8.0 now)
When I use .distinct() on moderately large datasets (224GB, 8.5B rows,
I'm guessing about 500M are distinct) my jobs fail with:
14/04/17 15:04:02 INFO cluster.ClusterTaskSetManager: Loss was due to
java.io.FileNotFoundException
Btw, I've got System.setProperty(spark.shuffle.consolidate.files,
true) and use ext3 (CentOS...)
On Thu, Apr 17, 2014 at 3:20 PM, Ryan Compton compton.r...@gmail.com wrote:
Does this continue in newer versions? (I'm on 0.8.0 now)
When I use .distinct() on moderately large datasets (224GB, 8.5B
I am trying to read an edge list into a Graph. My data looks like
394365859 -- 136153151
589404147 -- 1361045425
I read it into a Graph via:
val edgeFullStrRDD: RDD[String] = sc.textFile(unidirFName)
val edgeTupRDD = edgeFullStrRDD.map(x = x.split(\t))
.map(x
Try this: https://www.dropbox.com/s/xf34l0ta496bdsn/.txt
This code:
println(g.numEdges)
println(g.numVertices)
println(g.edges.distinct().count())
gave me
1
9294
2
On Tue, Apr 22, 2014 at 5:14 PM, Ankur Dave ankurd...@gmail.com wrote:
I wasn't able to reproduce this
, Ryan Compton compton.r...@gmail.com
wrote:
I'm trying shoehorn a label propagation-ish algorithm into GraphX. I
need to update each vertex with the median value of their neighbors.
Unlike PageRank, which updates each vertex with the mean of their
neighbors, I don't have a simple commutative
I use both Pig and Spark. All my code is built with Maven into a giant
*-jar-with-dependencies.jar. I recently upgraded to Spark 1.0 and now
all my pig scripts fail with:
Caused by: java.lang.RuntimeException: Could not resolve error that
occured when launching map reduce job:
/bidirectional-network-current/part-r-1'
USING PigStorage() AS (id1:long, id2:long, weight:int);
ttt = LIMIT edgeList0 10;
DUMP ttt;
On Wed, May 28, 2014 at 12:55 PM, Ryan Compton compton.r...@gmail.com wrote:
It appears to be Spark 1.0 related. I made a pom.xml with a single
dependency on Spark
posted a JIRA https://issues.apache.org/jira/browse/SPARK-1952
On Wed, May 28, 2014 at 1:14 PM, Ryan Compton compton.r...@gmail.com wrote:
Remark, just including the jar built by sbt will produce the same
error. i,.e this pig script will fail:
REGISTER
/usr/share/osi1/spark-1.0.0/assembly
Just ran into this today myself. I'm on branch-1.0 using a CDH3
cluster (no modifications to Spark or its dependencies). The error
appeared trying to run GraphX's .connectedComponents() on a ~200GB
edge list (GraphX worked beautifully on smaller data).
Here's the stacktrace (it's quite similar to
4221 2014-07-31 01:01 /tmp/README.md
Regards,
Ryan Tabora
http://ryantabora.com
I started building Spark / running Spark tests this weekend and on maybe
5-10 occasions have run into a compiler crash while compiling
DataTypeConversions.scala.
Here https://gist.github.com/ryan-williams/7673d7da928570907f4d is a full
gist of an innocuous test command (mvn test -Dsuites
encountering this issue.
Typically you would have changed one or more of the profiles/options -
which leads to this occurring.
2014-10-22 22:00 GMT-07:00 Ryan Williams ryan.blake.willi...@gmail.com:
I started building Spark / running Spark tests this weekend and on maybe
5-10 occasions have run
Fwiw if you do decide to handle language detection on your machine this
library works great on tweets https://github.com/carrotsearch/langid-java
On Tue, Nov 11, 2014, 7:52 PM Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Wed, Nov 12, 2014 at 5:42 AM, SK skrishna...@gmail.com wrote:
But
TD's portion seems to start at 27:24: http://youtu.be/jcJq3ZalXD8?t=27m24s
On Tue Dec 16 2014 at 7:13:43 AM Gerard Maas gerard.m...@gmail.com wrote:
Hi Jeniba,
The second part of this meetup recording has a very good answer to your
question. TD explains the current behavior and the on-going
and have a bunch of ideas about where it should go.
Thanks,
-Ryan
Thanks so much Shixiong! This is great.
On Tue, Jun 2, 2015 at 8:26 PM Shixiong Zhu zsxw...@gmail.com wrote:
Ryan - I sent a PR to fix your issue:
https://github.com/apache/spark/pull/6599
Edward - I have no idea why the following error happened. ContextCleaner
doesn't use any Hadoop API
I think this is causing issues upgrading ADAM
https://github.com/bigdatagenomics/adam to Spark 1.3.1 (cf. adam#690
https://github.com/bigdatagenomics/adam/pull/690#issuecomment-107769383);
attempting to build against Hadoop 1.0.4 yields errors like:
2015-06-02 15:57:44 ERROR Executor:96 -
or comments!
-Ryan
Hey Jeff, in addition to what Sandy said, there are two more reasons that
this might not be as bad as it seems; I may be incorrect in my
understanding though.
First, the "additional step" you're referring to is not likely to be adding
any overhead; the "extra map" is really just materializing the
Yeah, a little disappointed with this, I wouldn't expect to be sent
unsolicited mail based on my membership to this list.
-Ryan Victory
On Tue, Feb 9, 2016 at 1:36 PM, John Omernik <j...@omernik.com> wrote:
> All, I received this today, is this appropriate list use? Note: This was
>
know dictionary of words
> if
> >> there is no schema provided by user? Where/how to specify my schema /
> >> config for Parquet format?
> >>
> >> Could not find Apache Parquet mailing list in the official site. It
> would
> >> be great if anyone could share it as well.
> >>
> >> Regards
> >> Ashok
> >>
> >
> >
>
--
Ryan Blue
Software Engineer
Netflix
progress"
> java.lang.OutOfMemoryError: Java heap space at
> java.util.Arrays.copyOfRange(Arrays.java:3664) at
> java.lang.String.(String.java:207) at
> java.lang.StringBuilder.toString(StringBuilder.java:407) at
> scala.collection.mutable.StringBuilder.toString(StringBuilder.scala:430)
> at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:101)
> at
> org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71)
> at
> org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:55)
> at java.util.TimerThread.mainLoop(Timer.java:555) at
> java.util.TimerThread.run(Timer.java:505)
>
>
--
Ryan Blue
Software Engineer
Netflix
astore, can you tell me which
> version is more compatible with Spark 2.0.2 ?
>
> THanks
>
--
Ryan Blue
Software Engineer
Netflix
>
>
> Regards
> Sumit Chawla
>
>
--
Ryan Blue
Software Engineer
Netflix
for a stage. In that version, you probably want to set
spark.blacklist.task.maxTaskAttemptsPerExecutor. See the settings docs
<http://spark.apache.org/docs/latest/configuration.html> and search for
“blacklist” to see all the options.
rb
On Mon, Apr 24, 2017 at 9:41 AM, Ryan Blue <rb...@netflix.c
.
>> memoryOverhead.
>>
>> Driver memory=4g, executor mem=12g, num-executors=8, executor core=8
>>
>> Do you think below setting can help me to overcome above issue:
>>
>> spark.default.parellism=1000
>> spark.sql.shuffle.partitions=1000
>>
>> Because default max number of partitions are 1000.
>>
>>
>>
>
--
Ryan Blue
Software Engineer
Netflix
an aggregate baseline.
Ryan
Ryan Adams
radams...@gmail.com
On Tue, Jun 12, 2018 at 11:51 AM, Lars Albertsson wrote:
> Hi,
>
> I wrote this answer to the same question a couple of years ago:
> https://www.mail-archive.com/user%40spark.apache.org/msg48032.html
>
> I
kFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:419)
> at
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:349)
>
>
--
Ryan Blue
Software Engineer
Netflix
or shouldn't
> come. Let me know if this understanding is correct
>
> On Tue, May 1, 2018 at 9:37 PM, Ryan Blue <rb...@netflix.com> wrote:
>
>> This is usually caused by skew. Sometimes you can work around it by in
>> creasing the number of partitions like you tri
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Ryan Adams
radams...@gmail.com
elson, Assaf
>>> wrote:
>>>
>>> Could you add a fuller code example? I tried to reproduce it in my
>>> environment and I am getting just one instance of the reader…
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Assaf
>
unsubscribe
Ryan Adams
radams...@gmail.com
doopConfWithOptions(relation.options))
> )
>
> *import *scala.collection.JavaConverters._
>
> *val *rows = readFile(pFile).flatMap(_ *match *{
> *case *r: InternalRow => *Seq*(r)
>
> // This doesn't work. vector mode is doing something screwy
> *case *b: ColumnarBatch => b.rowIterator().asScala
> }).toList
>
> *println*(rows)
> //List([0,1,5b,24,66647361])
> //??this is wrong I think
>
>
>
> Has anyone attempted something similar?
>
>
>
> Cheers Andrew
>
>
>
--
Ryan Blue
Software Engineer
Netflix
get(0,
> DataTypes.DateType));
>
> }
>
> It prints an integer as output:
>
> MyDataWriter.write: 17039
>
>
> Is this a bug? or I am doing something wrong?
>
> Thanks,
> Shubham
>
--
Ryan Blue
Software Engineer
Netflix
om there. This isn't ideal but it's better than nothing.
-Ryan
On Wed, Nov 25, 2020 at 9:13 AM Chris Coutinho
wrote:
> I'm also curious if this is possible, so while I can't offer a solution
> maybe you could try the following.
>
> The driver and executor nodes need to have access
em. When I create a spark application JAR and try to run it from my
laptop, I get the same problem as #1, namely that it tries to find the
warehouse directory on my laptop itself.
Am I crazy? Perhaps this isn't a supported way to use Spark? Any help or
insights are much appreciated!
-Ryan Victory
Thanks Apostolos,
I'm trying to avoid standing up HDFS just for this use case (single node).
-Ryan
On Wed, Nov 25, 2020 at 8:56 AM Apostolos N. Papadopoulos <
papad...@csd.auth.gr> wrote:
> Hi Ryan,
>
> since the driver is at your laptop, in order to access a remote file you
&g
Did you enable "spark.speculation"?
On Tue, Jan 5, 2016 at 9:14 AM, Prasad Ravilla wrote:
> I am using Spark 1.5.2.
>
> I am not using Dynamic allocation.
>
> Thanks,
> Prasad.
>
>
>
>
> On 1/5/16, 3:24 AM, "Ted Yu" wrote:
>
> >Which version of Spark do
Hey Rachana,
There are two jobs in your codes actually: `rdd.isEmpty` and
`rdd.saveAsTextFile`. Since you don't cache or checkpoint this rdd, it will
execute your map function twice for each record.
You can move "accum.add(1)" to "rdd.saveAsTextFile" like this:
JavaDStream lines =
You can use "_", e.g.,
sparkConf.registerKryoClasses(Array(classOf[scala.Tuple3[_, _, _]]))
Best Regards,
Shixiong(Ryan) Zhu
Software Engineer
Databricks Inc.
shixi...@databricks.com
databricks.com
<http://databricks.com/>
On Wed, Dec 30, 2015 at 10:16 AM, Russ <russ.br..
You can use "reduceByKeyAndWindow", e.g.,
val lines = ssc.socketTextStream("localhost", )
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKeyAndWindow((x: Int,
y: Int) => x + y, Seconds(60), Seconds(60))
wordCounts.print()
On Wed, Dec
Could you disable `spark.kryo.registrationRequired`? Some classes may not
be registered but they work well with Kryo's default serializer.
On Fri, Jan 8, 2016 at 8:58 AM, Ted Yu wrote:
> bq. try adding scala.collection.mutable.WrappedArray
>
> But the hint said registering
Could you use "coalesce" to reduce the number of partitions?
Shixiong Zhu
On Mon, Jan 11, 2016 at 12:21 AM, Gavin Yue wrote:
> Here is more info.
>
> The job stuck at:
> INFO cluster.YarnScheduler: Adding task set 1.0 with 79212 tasks
>
> Then got the error:
> Caused
Hey Terry,
That's expected. If you want to only output (1, 3), you can use
"reduceByKey" before "mapWithState" like this:
dstream.reduceByKey(_ + _).mapWithState(spec)
On Fri, Jan 15, 2016 at 1:21 AM, Terry Hoo wrote:
> Hi,
> I am doing a simple test with mapWithState,
gt; allKafkaWindowData =
> this.sparkReceiverDStream.createReceiverDStream(jsc,this.streamingConf.getWindowDuration(),
>
> this.streamingConf.getSlideDuration());
>
>
>
> this.businessProcess(allKafkaWindowData);
>
> this.sleep();
>
>jsc.start();
&
Hey Marco,
Since the codes in Future is in an asynchronous way, you cannot call
"sparkContext.stop" at the end of "fetch" because the codes in Future may
not finish.
However, the exception seems weird. Do you have a simple reproducer?
On Mon, Jan 18, 2016 at 9:13 AM, Ted Yu
mapWithState uses HashPartitioner by default. You can use
"StateSpec.partitioner" to set your custom partitioner.
On Sun, Jan 17, 2016 at 11:00 AM, Lin Zhao wrote:
> When the state is passed to the task that handles a mapWithState for a
> particular key, if the key is
Hey, did you mean that the scheduling delay timeline is incorrect because
it's too short and some values are missing? A batch won't have a scheduling
delay until it starts to run. In your example, a lot of batches are waiting
so that they don't have the scheduling delay.
On Sun, Jan 17, 2016 at
Could you change MEMORY_ONLY_SER to MEMORY_AND_DISK_SER_2 and see if this
still happens? It may be because you don't have enough memory to cache the
events.
On Thu, Jan 14, 2016 at 4:06 PM, Lin Zhao wrote:
> Hi,
>
> I'm testing spark streaming with actor receiver. The actor
Yeah, it's hard code as "0.0.0.0". Could you send a PR to add a
configuration for it?
On Thu, Jan 14, 2016 at 2:51 PM, Zee Chen wrote:
> Hi, what is the easiest way to configure the Spark webui to bind to
> localhost or 127.0.0.1? I intend to use this with ssh socks proxy to
>
:31:31 INFO storage.BlockManager: Removing RDD 44
> 16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 43
> 16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 42
> 16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 41
> 16/01/15 00:31:31 INFO storage.BlockMana
Could you try to use "Kryo.setDefaultSerializer" like this:
class YourKryoRegistrator extends KryoRegistrator {
override def registerClasses(kryo: Kryo) {
kryo.setDefaultSerializer(classOf[com.esotericsoftware.kryo.serializers.JavaSerializer])
}
}
On Thu, Jan 14, 2016 at 12:54 PM, Durgesh
Could you show your codes? Did you use `StreamingContext.awaitTermination`?
If so, it will return if any exception happens.
On Wed, Jan 13, 2016 at 11:47 PM, Triones,Deng(vip.com) <
triones.d...@vipshop.com> wrote:
> What’s more, I am running a 7*24 hours job , so I won’t call System.exit()
> by
oint(RDD.scala:300)
>
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
>
> at org.a
You can't. The number of cores must be great than the number of receivers.
On Wed, Feb 10, 2016 at 2:34 AM, ajay garg wrote:
> Hi All,
> I am running 3 executors in my spark streaming application with 3
> cores per executors. I have written my custom receiver
tion.Iterator$$anon$
> org.apache.spark.InterruptibleIterator#
> scala.collection.IndexedSeqLike$Elements#
> scala.collection.mutable.ArrayOps$ofRef#
> java.lang.Object[]#
>
>
>
>
> On Thu, Feb 4, 2016 at 7:14 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrot
Hey Udo,
mapWithState usually uses much more memory than updateStateByKey since it
caches the states in memory.
However, from your description, looks BlockGenerator cannot push data into
BlockManager, there may be something wrong in BlockGenerator. Could you
share the top 50 objects in the heap
Could you do a thread dump in the executor that runs the Kinesis receiver
and post it? It would be great if you can provide the executor log as well?
On Tue, Feb 9, 2016 at 3:14 PM, Roberto Coluccio wrote:
> Hello,
>
> can anybody kindly help me out a little bit
Are you using a custom input dstream? If so, you can make the `compute`
method return None to skip a batch.
On Thu, Feb 11, 2016 at 1:03 PM, Sebastian Piu
wrote:
> I was wondering if there is there any way to skip batches with zero events
> when streaming?
> By skip I
is behaviour
>
> Thanks!
> On 11 Feb 2016 9:07 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
> wrote:
>
>> Are you using a custom input dstream? If so, you can make the `compute`
>> method return None to skip a batch.
>>
>> On Thu, Feb 11
Thanks for reporting it. I will take a look.
On Thu, Feb 4, 2016 at 6:56 AM, Yuval.Itzchakov wrote:
> Hi,
> I've been playing with the expiramental PairDStreamFunctions.mapWithState
> feature and I've seem to have stumbled across a bug, and was wondering if
> anyone else has
Do you just want to write some unit tests? If so, you can use "queueStream"
to create a DStream from a queue of RDDs. However, because it doesn't
support metadata checkpointing, it's better to only use it in unit tests.
On Fri, Jan 29, 2016 at 7:35 AM, Sateesh Karuturi <
1 - 100 of 223 matches
Mail list logo