I have a problem with add jar command
hql(add jar /.../xxx.jar)
Error:
Exception in thread main java.lang.AssertionError: assertion failed: No
plan for AddJar ...
How could I do this job with HiveContext, I can't find any api to do it.
Does SparkSQL with Hive support UDF/UDAF?
--
View this
Can you try this in master? You are likely running into SPARK-2128
https://issues.apache.org/jira/browse/SPARK-2128.
Michael
On Mon, Jun 16, 2014 at 11:41 PM, Earthson earthson...@gmail.com wrote:
I have a problem with add jar command
hql(add jar /.../xxx.jar)
Error:
Exception in
Hello,
I wish to contribute some algorithms to the MLLib of Spark but at the same
time wanted to make sure that I don't try something redundant.
Will it be okay with you to let me know the set of algorithms which aren't
there in your road map in the near future ?
Also, can I use Java to write
hi, steven, have you resolved this problem? i encounter the same
problem, too.
2014-04-18 3:48 GMT+08:00 Sean Owen so...@cloudera.com:
Oh dear I read this as a build problem. I can build with the latest
Java 7, including those versions of Spark and Mesos, no problem. I did
not deploy
i am using spark 0.9.1 , mesos 0.19.0 and tachyon 0.4.1 , is spark0.9.1
compatiable with mesos0.19.0?
2014-06-17 15:50 GMT+08:00 qingyang li liqingyang1...@gmail.com:
hi, steven, have you resolved this problem? i encounter the same
problem, too.
2014-04-18 3:48 GMT+08:00 Sean Owen
Update.
I've reconfigured the environment to use Spark 1.0.0 and the example
finally worked! :)
The different for me was that Spark 1.0.0 requires only to specify the
hadoop conf dir (HADOOP_CONF_DIR=/etc/hadoop/conf/)
I guess that with 0.9 there were problems in spotting this dir...but I'm
not
Thanks, will try normalising it.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-regression-results-way-off-tp7672p7720.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Admittedly getting Spark Streaming / Kafka working for the first time can
be a bit tricky with the web of dependencies that get pulled in. I've
taken the KafkaWorkCount example from the Spark project and set up a simple
standalone SBT project that shows you how to get it working and using
Hi,
I've stuck using either yarn-client or standalone-client mode. Either will
stuck when I submit jobs, the last messages it printed were:
...
14/06/17 02:37:17 INFO spark.SparkContext: Added JAR
file:/x/home/jianshuang/tmp/lib/commons-vfs2.jar at
First a clarification: Spark SQL does not talk to HiveServer2, as that
JDBC interface is for retrieving results from queries that are executed
using Hive. Instead Spark SQL will execute queries itself by directly
accessing your data using Spark.
Spark SQL's Hive module can use JDBC to connect
For standalone-cluster mode, there's a scala.MatchError.
Also it looks like the --jars configurations are not passed to the
driver/worker node? (also copying from file:/path doesn't seem correct,
shouldn't it copy form http://master/path ?)
...
14/06/17 04:15:30 INFO Worker: Asked to launch
Hello,
I have been evaluating LogisticRegressionWithSGD of Spark 1.0 MLlib on
Hadoop 0.20.2-cdh3u6 but it does not work for a sparse dataset though
the number of training examples used in the evaluation is just 1,000.
It works fine for the dataset *news20.binary.1000* that has 178,560
features.
I didn't fix the issue so much as work around it. I was running my cluster
locally, so using HDFS was just a preference. The code worked with the local
file system, so that's what I'm using until I can get some help.
--
View this message in context:
Hi all,
I want to do a recursive leftOuterJoin between an RDD (created from file)
with 9 million rows(size of the file is 100MB) and 30 other RDDs(created from
30 diff files in each iteration of a loop) varying from 1 to 6 million rows.
When I run it for 5 RDDs,its running successfully in
Hi Sguj and littlebird,
I'll try to fix it tomorrow evening and the day after tomorrow, because I
am now busy preparing a talk (slides) tomorrow. Sorry for the inconvenience
to you. Would you mind to write an issue on Spark JIRA?
2014-06-17 20:55 GMT+08:00 Sguj tpcome...@yahoo.com:
I didn't
Hi,
(Apologies for the long mail, but it's necessary to provide sufficient
details considering the number of issues faced.)
I'm running into issues testing LogisticRegressionWithSGD a two node
cluster (each node with 24 cores and 16G available to slaves out of 24G on
the system). Here's a
After playing a bit, I have been able to create a fatjar this way:
lazy val rootDependencies = Seq(
org.apache.spark %% spark-core % 1.0.0 % provided,
org.apache.spark %% spark-streaming % 1.0.0 % provided,
org.apache.spark %% spark-streaming-twitter % 1.0.0
Long story [1] short, akka opens up dynamic, random ports for each job [2].
So, simple NAT fails. You might try some trickery with a DNS server and
docker's --net=host .
[1]
http://apache-spark-user-list.1001560.n3.nabble.com/Comprehensive-Port-Configuration-reference-tt5384.html#none
[2]
I've been trying to figure out how to increase the heap space for my spark
environment in 1.0.0, and all of the things I've found tell me I have export
something in Java Opts, which is deprecated in 1.0.0, or in increase the
spark.executor.memory, which is at 6G. I'm only trying to process about
I've been trying to figure out how to increase the heap space for my spark
environment in 1.0.0, and all of the things I've found tell me I have export
something in Java Opts, which is deprecated in 1.0.0, or in increase the
spark.executor.memory, which is at 6G. I'm only trying to process about
Try repartitioning the RDD using coalsce(int partitions) before performing
any transforms.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-tp7735p7736.html
Sent from the Apache Spark User List mailing
I can write one if you'll point me to where I need to write it.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7737.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hey
I am new to spark streaming and apologize if these questions have been
asked.
* In StreamingContext, reduceByKey() seems to only work on the RDDs of the
current batch interval, not including RDDs of previous batches. Is my
understanding correct?
* If the above statement is correct, what
Gerard,
Strings in particular are very inefficient because they're stored in a
two-byte format by the JVM. If you use the Kryo serializer and have use
StorageLevel.MEMORY_ONLY_SER then Kryo stores Strings in UTF8, which for
ASCII-like strings will take half the space.
Andrew
On Tue, Jun 17,
I am creating around 10 executors with 12 cores and 7g memory, but when i
launch a task not all executors are being used. For example if my job has 9
tasks, only 3 executors are being used with 3 task each and i believe this
is making my app slower than map reduce program for the same use case.
Ok... I was checking the wrong version of that file yesterday. My worker is
sending a DriverStateChanged(_, DriverState.FAILED, _) but there is no case
branch for that state and the worker is crashing. I still don't know why
I'm getting a FAILED state but I'm sure that should kill the actor due to
Standalone-client mode is not officially supported at the moment. For
standalone-cluster and yarn-client modes, however, they should work.
For both modes, are you running spark-submit from within the cluster, or
outside of it? If the latter, could you try running it from within the
cluster and
How long does it get stuck for? This is a common sign for the OS thrashing
due to out of memory exceptions. If you keep it running longer, does it
throw an error?
Depending on how large your other RDD is (and your join operation), memory
pressure may or may not be the problem at all. It could be
I have been able to submit a job successfully but I had to config my spark
job this way:
val sparkConf: SparkConf =
new SparkConf()
.setAppName(TwitterPopularTags)
.setMaster(spark://int-spark-master:7077)
.setSparkHome(/opt/spark)
I have been able to submit a job successfully but I had to config my spark
job this way:
val sparkConf: SparkConf =
new SparkConf()
.setAppName(TwitterPopularTags)
.setMaster(spark://int-spark-master:7077)
.setSparkHome(/opt/spark)
Thanks Michael!
as I run it using spark-shell, so I added both jars through bin/spark-shell
--jars options. I noticed if I don't pass these jars, it complains it
couldn't find the driver, if I pass them through --jars options, it
complains there is no suitable driver.
Regards.
On Tue, Jun 17,
I've tried enabling the speculative jobs, this seems partially solved the
problem, however I'm not sure if it can handle large-scale situations as it
only start when 75% of the job is finished.
--
View this message in context:
Can some one help me with this. Any help is appreciated.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Executors-not-utilized-properly-tp7744p7753.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I think you need to implement a timeout in your code. As far as I know,
Spark will not interrupt the execution of your code as long as the driver
is connected. Might be an idea though.
On Tue, Jun 17, 2014 at 7:54 PM, Peng Cheng pc...@uow.edu.au wrote:
I've tried enabling the speculative jobs,
Luis' experience validates what I'm seeing. You have to still set the
properties in the SparkConf for the context to work. For example, master URL
and jars are specified again in the app.
Gino B.
On Jun 17, 2014, at 12:05 PM, Luis Ángel Vicente Sánchez
langel.gro...@gmail.com wrote:
I
It sounds like your job has 9 tasks and all are executing simultaneously in
parallel. This is as good as it gets right? Are you asking how to break the
work into more tasks, like 120 to match your 10*12 cores? Make your RDD
have more partitions. For example the textFile method can override the
I've been wondering about this. Is there a difference in performance
between these two?
val rdd1 = sc.textFile(files.mkString(,)) val rdd2 = sc.union(files.map(sc
.textFile(_)))
I don't know about your use-case, Meethu, but it may be worth trying to see
if reading all the files into one RDD
Hi,
I'm having trouble running spark on mesos in fine-grained mode. I'm running
spark 1.0.0 and mesos 0.18.0. The tasks are failing randomly, which most of
the time, but not always, cause the job to fail. The same code is running
fine in coarse-grained mode. I see the following exceptions in the
I did try creating more partitions by overriding the default number of
partitions determined by HDFS splits. Problem is, in this case program will
run for ever. I have same set of inputs for map reduce and spark. Where map
reduce is taking 2 mins, spark is taking 5 min to complete the job. I
Here is follow-up to the previous evaluation.
aggregate at GradientDescent.scala:178 never finishes at
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L178
We confirmed, by -verbose:gc, that GC is not happening during the
Hello,
Is there an easy way to convert RDDs within a DStream into Parquet records?
Here is some incomplete pseudo code:
// Create streaming context
val ssc = new StreamingContext(...)
// Obtain a DStream of events
val ds = KafkaUtils.createStream(...)
// Get Spark context to get to the SQL
Hi Makoto,
How many partitions did you set? If there are too many partitions,
please do a coalesce before calling ML algorithms.
Btw, could you try the tree branch in my repo?
https://github.com/mengxr/spark/tree/tree
I used tree aggregate in this branch. It should help with the scalability.
Dear all,
I am trying to run the following query on Spark SQL using some custom TPC-H
tables with standalone Spark cluster configuration:
SELECT * FROM history a JOIN history b ON a.o_custkey = b.o_custkey WHERE
a.c_address b.c_address;
Unfortunately I get the following error during execution:
Hi Jayati,
Thanks for asking! MLlib algorithms are all implemented in Scala. It
makes us easier to maintain if we have the implementations in one
place. For the roadmap, please visit
http://www.slideshare.net/xrmeng/m-llib-hadoopsummit to see features
planned for v1.1. Before contributing new
Hi Bharath,
Thanks for posting the details! Which Spark version are you using?
Best,
Xiangrui
On Tue, Jun 17, 2014 at 6:48 AM, Bharath Ravi Kumar reachb...@gmail.com wrote:
Hi,
(Apologies for the long mail, but it's necessary to provide sufficient
details considering the number of issues
Hi Xiangrui,
What's different between treeAggregate and aggregate? Why
treeAggregate scales better? What if we just use mapPartition, will it
be as fast as treeAggregate?
Thanks.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
Hi Xiangrui,
(2014/06/18 4:58), Xiangrui Meng wrote:
How many partitions did you set? If there are too many partitions,
please do a coalesce before calling ML algorithms.
The training data news20.random.1000 is small and thus only 2
partitions are used by the default.
val training =
Mahesh,
- One direction could be : create a parquet schema, convert save the
records to hdfs.
- This might help
https://github.com/massie/spark-parquet-example/blob/master/src/main/scala/com/zenfractal/SparkParquetExample.scala
Cheers
k/
On Tue, Jun 17, 2014 at 12:52 PM,
Hi Abhishek,
Where mapreduce is taking 2 mins, spark is taking 5 min to complete the
job.
Interesting. Could you tell us more about your program? A code skeleton
would certainly be helpful.
Thanks!
-Jey
On Tue, Jun 17, 2014 at 3:21 PM, abhiguruvayya sharath.abhis...@gmail.com
wrote:
I did
Hi DB,
treeReduce (treeAggregate) is a feature I'm testing now. It is a
compromise between current reduce and butterfly allReduce. The former
runs in linear time on the number of partitions, the latter introduces
too many dependencies. treeAggregate with depth = 2 should run in
O(sqrt(n)) time,
Hi,
I have 3 unit tests (independent of each other) in the /src/test/scala
folder. When I run each of them individually using: sbt test-only test,
all the 3 pass the test. But when I run them all using sbt test, then they
fail with the warning below. I am wondering if the binding exception
Hi Makoto,
Are you using Spark 1.0 or 0.9? Could you go to the executor tab of
the web UI and check the driver's memory?
treeAggregate is not part of 1.0.
Best,
Xiangrui
On Tue, Jun 17, 2014 at 2:00 PM, Xiangrui Meng men...@gmail.com wrote:
Hi DB,
treeReduce (treeAggregate) is a feature I'm
Hi Xiangrui,
Does it mean that mapPartition and then reduce shares the same
behavior as aggregate operation which is O(n)?
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Tue, Jun 17,
Hi Xiangrui,
(2014/06/18 6:03), Xiangrui Meng wrote:
Are you using Spark 1.0 or 0.9? Could you go to the executor tab of
the web UI and check the driver's memory?
I am using Spark 1.0.
588.8 MB is allocated for driver RDDs.
I am setting SPARK_DRIVER_MEMORY=2g in the conf/spark-env.sh.
The
Xiangrui,
Could you point to the JIRA related to tree aggregate ? ...sounds like the
allreduce idea...
I would definitely like to try it on our dataset...
Makoto,
I did run pretty big sparse dataset (20M rows, 3M sparse features) and I
got 100 iterations of SGD running in 200 seconds...10
Thanks Krishna. Seems like you have to use Avro and then convert that to
Parquet. I was hoping to directly convert RDDs to Parquet files. I’ll look into
this some more.
Thanks,
Mahesh
From: Krishna Sankar ksanka...@gmail.commailto:ksanka...@gmail.com
Reply-To:
finally got it work out, mimicked how spark added datanucleus jars in
compute-classpath.sh, and added the db2jcc*.jar in the classpath, it works
now.
Thanks!
On Tue, Jun 17, 2014 at 10:50 AM, Jenny Zhao linlin200...@gmail.com wrote:
Thanks Michael!
as I run it using spark-shell, so I added
If a RDD object have non-empty .dependencies, does that means it have
lineage? How could I remove it?
I'm doing iterative computing and each iteration depends on the result
computed in previous iteration. After several iteration, it will throw
StackOverflowError.
At first I'm trying to use
Can anybody explain WHY:
1) LabeledPoint is in regression/LabeledPoint.scala? This cause import
regression modules from classification modules.
2) Vector and SparseVector are in linalg? OK. GeneralizedLinearModel is in
regression/GeneralizedLinearAlgorithm.scala? Really?
3) LinearModel is in
I am using cutting edge code from git but doing my own sbt assembly.
On Mon, Jun 16, 2014 at 10:28 PM, Andre Schumacher
schum...@icsi.berkeley.edu wrote:
Hi,
are you using the amplab/spark-1.0.0 images from the global registry?
Andre
On 06/17/2014 01:36 AM, Mohit Jaggi wrote:
Hi
Some people (me included) might have wondered why all our m1.large spot
instances (in us-west-1) shut down a few hours ago...
Simple reason: The EC2 spot price for Spark's default m1.large instances
just jumped from 0.016 per hour, to about 0.750. Yes, Fifty times. Probably
something to do with
If you convert the data to a SchemaRDD you can save it as Parquet:
http://spark.apache.org/docs/latest/sql-programming-guide.html#using-parquet
On Tue, Jun 17, 2014 at 11:47 PM, Padmanabhan, Mahesh (contractor)
mahesh.padmanab...@twc-contractor.com wrote:
Thanks Krishna. Seems like you have
DB, Yes, reduce and aggregate are linear.
Makoto, dense vectors are used to in aggregation. If you have 32
partitions and each one sending a dense vector of size 1,354,731 to
master. Then the driver needs 300M+. That may be the problem. Which
deploy mode are you using, standalone or local?
I found the main reason to be that i was using coalesce instead of
repartition. coalesce was shrinking the portioning so the number of tasks
were very less to be executed by all of the executors. Can you help me in
understudying when to use coalesce and when to use repartition. In
application
I remember having to do a similar thing in the spark docker scripts for
testing purposes. Were you able to modify the /etc/hosts directly? I
remember issues with that as docker apparently mounts it as part of its
read-only filesystem.
On Tue, Jun 17, 2014 at 4:36 PM, Mohit Jaggi
Makoto, please use --driver-memory 8G when you launch spark-shell. -Xiangrui
On Tue, Jun 17, 2014 at 4:49 PM, Xiangrui Meng men...@gmail.com wrote:
DB, Yes, reduce and aggregate are linear.
Makoto, dense vectors are used to in aggregation. If you have 32
partitions and each one sending a
repartition() is actually just an alias of coalesce(), but which the
shuffle flag to set to true. This shuffle is probably what you're seeing as
taking longer, but it is required when you go from a smaller number of
partitions to a larger.
When actually decreasing the number of partitions,
Trying to aggregate over a sliding window, playing with the slide duration.
Playing around with the slide interval I can see the aggregation works but
mostly fails with the below error. The stream has records coming in at
100ms.
JavaPairDStreamString, AggregateObject aggregatedDStream =
There is a bug:
https://github.com/apache/spark/pull/961#issuecomment-45125185
On Tue, Jun 17, 2014 at 8:19 PM, Hatch M hatchman1...@gmail.com wrote:
Trying to aggregate over a sliding window, playing with the slide duration.
Playing around with the slide interval I can see the aggregation
Perfect!! That makes so much sense to me now. Thanks a ton
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Executors-not-utilized-properly-tp7744p7793.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi Xiangrui ,
I'm using 1.0.0.
Thanks,
Bharath
On 18-Jun-2014 1:43 am, Xiangrui Meng men...@gmail.com wrote:
Hi Bharath,
Thanks for posting the details! Which Spark version are you using?
Best,
Xiangrui
On Tue, Jun 17, 2014 at 6:48 AM, Bharath Ravi Kumar reachb...@gmail.com
wrote:
The error message *means* that there is no column called c_address.
However, maybe it's a bug with Spark SQL not understanding the
a.c_address syntax. Can you double-check the column name is correct?
Thanks
Tobias
On Wed, Jun 18, 2014 at 5:02 AM, Zuhair Khayyat
zuhair.khay...@gmail.com wrote:
Hi Spark Gurus,
I am trying to compile a spark streaming example with CDH5 and having
problem compiling it.
Has anyone created an example spark streaming using CDH5(preferably Spark
0.9.1) and would be kind enough to share the build.sbt(.scala) file?(or
point to their example on github). I know
Hi, I think this is a bug in Spark, because changing my program to using
a main method instead of using the App trait fixes this problem.
I've filed this as SPARK-2175, apologies if this turns out to be a
duplicate.
https://issues.apache.org/jira/browse/SPARK-2175
Regards,
Brandon.
--
View
Hi Andrew,
I submitted that within the cluster. Looks like the standalone-cluster mode
didn't put the jars to its http server, and passed the file:/... to the
driver node. That's why the driver node couldn't find the jars.
However, I copied my files to all slaves, it still didn't work, see my
Couple more points:
1)The inexplicable stalling of execution with large feature sets appears
similar to that reported with the news-20 dataset:
http://mail-archives.apache.org/mod_mbox/spark-user/201406.mbox/%3c53a03542.1010...@gmail.com%3E
2) The NPE trying to call mapToPair convert an RDDLong,
I used --privileged to start the container and then unmounted /etc/hosts.
Then I created a new /etc/hosts file
On Tue, Jun 17, 2014 at 4:58 PM, Aaron Davidson ilike...@gmail.com wrote:
I remember having to do a similar thing in the spark docker scripts for
testing purposes. Were you able to
Hi Xiangrui,
(2014/06/18 8:49), Xiangrui Meng wrote:
Makoto, dense vectors are used to in aggregation. If you have 32
partitions and each one sending a dense vector of size 1,354,731 to
master. Then the driver needs 300M+. That may be the problem.
It seems that it could cuase certain problems
Yup, alright, same solution then :)
On Tue, Jun 17, 2014 at 7:39 PM, Mohit Jaggi mohitja...@gmail.com wrote:
I used --privileged to start the container and then unmounted /etc/hosts.
Then I created a new /etc/hosts file
On Tue, Jun 17, 2014 at 4:58 PM, Aaron Davidson ilike...@gmail.com
Hey Jeremy,
This is patched in the 1.0 and 0.9 branches of Spark. We're likely to
make a 1.0.1 release soon (this patch being one of the main reasons),
but if you are itching for this sooner, you can just checkout the head
of branch-1.0 and you will be able to use r3.XXX instances.
- Patrick
On
Hi, I have a 40G file which is a concatenation of multiple documents, I
want to extract two features (title and tables) from each doc, so the
program is like this:
-
val file = sc.textFile(/path/to/40G/file)
//file.cache() //to
By the way, in case it's not clear, I mean our maintenance branches:
https://github.com/apache/spark/tree/branch-1.0
On Tue, Jun 17, 2014 at 8:35 PM, Patrick Wendell pwend...@gmail.com wrote:
Hey Jeremy,
This is patched in the 1.0 and 0.9 branches of Spark. We're likely to
make a 1.0.1
It would be convenient if Spark's textFile, parquetFile, etc. can support
path with wildcard, such as:
hdfs://domain/user/jianshuang/data/parquet/table/month=2014*
Or is there already a way to do it now?
Jianshi
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github Blog:
Hi Jianshi,
I have used wild card characters (*) in my program and it worked..
My code was like this
b = sc.textFile(hdfs:///path to file/data_file_2013SEP01*)
Thanks Regards,
Meethu M
On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
It would be
Hi,
This is about spark 0.9.
I have a 3 node spark cluster. I want to add a locally available jarfile
(present on all nodes) to the SPARK_CLASPATH variable in
/etc/spark/conf/spark-env.sh so that all nodes can access it.
Question is,
should I edit 'spark-env.sh' on all nodes to add the jar ?
I am about to spin up some new clusters, so I may give that a go... any
special instructions for making them work? I assume I use the
--spark-git-repo= option on the spark-ec2 command. Is it as easy as
concatenating your string as the value?
On cluster management GUIs... I've been looking around
Actually you'll just want to clone the 1.0 branch then use the
spark-ec2 script in there to launch your cluster. The --spark-git-repo
flag is if you want to launch with a different version of Spark on the
cluster. In your case you just need a different version of the launch
script itself, which
These paths get passed directly to the Hadoop FileSystem API and I
think the support globbing out-of-the box. So AFAIK it should just
work.
On Tue, Jun 17, 2014 at 9:09 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi Jianshi,
I have used wild card characters (*) in my program and it
In Spark you can use the normal globs supported by Hadoop's FileSystem,
which are documented here:
http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)
On Wed, Jun 18, 2014 at 12:09 AM, MEETHU MATHEW meethu2...@yahoo.co.in
wrote:
Yeah, sorry that error message is not very intuitive. There is already a
JIRA open to make it better: SPARK-2059
https://issues.apache.org/jira/browse/SPARK-2059
Also, a bug has been fixed in master regarding attributes that contain _.
So if you are running 1.0 you might try upgrading.
On
There are a few options:
- Kryo might be able to serialize these objects out of the box, depending
what’s inside them. Try turning it on as described at
http://spark.apache.org/docs/latest/tuning.html.
- If that doesn’t work, you can create your own “wrapper” objects that
implement
Out of curiosity - are you guys using speculation, shuffle
consolidation, or any other non-default option? If so that would help
narrow down what's causing this corruption.
On Tue, Jun 17, 2014 at 10:40 AM, Surendranauth Hiraman
suren.hira...@velos.io wrote:
Matt/Ryan,
Did you make any headway
Thanks! Will try to get the fix and retest.
On Tue, Jun 17, 2014 at 5:30 PM, onpoq l onpo...@gmail.com wrote:
There is a bug:
https://github.com/apache/spark/pull/961#issuecomment-45125185
On Tue, Jun 17, 2014 at 8:19 PM, Hatch M hatchman1...@gmail.com wrote:
Trying to aggregate over a
93 matches
Mail list logo