I've found the problem.
I've removed guava14.0 from the extraClassPath in my spark job, and there is
no exception.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail:
Hi,
I’ve got the following exception when running a connected component example in
Spark 3.0.0.
This code runs without exception in Spark 2.4.
It throws the exception when calling RDDOperationScope.toJson method.
I’m not sure why it throws the execution in Spark 3.0. There is no exception in
Hi,
I'm encountering a strange exception in spark 2.4.4 (on AWS EMR 5.29):
org.apache.spark.storage.BlockException: Negative block size
-9223372036854775808.
I've seen this mostly from this line (for remote blocks)
All 800 files(in a partition folder) sizes are in bytes. It will sum up to
200 MB which is each partition folder input size. And I am using ORC format.
Never used Parquet format.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Done. https://issues.apache.org/jira/browse/SPARK-32130
On Mon, Jun 29, 2020 at 8:21 AM Maxim Gekk
wrote:
> Hello Sanjeev,
>
> It is hard to troubleshoot the issue without input files. Could you open
> an JIRA ticket at https://issues.apache.org/jira/projects/SPARK and
> attach the JSON files
So I should have done some back of the napkin math before all of this. You
are writing out 800 files, each < 128 MB. If they were 128 MB then it
would be 100GB of data being written, I'm not sure how much hardware you
have but, but the fact that you can shuffle about 100GB to a single thread
and
Hello Sanjeev,
It is hard to troubleshoot the issue without input files. Could you open an
JIRA ticket at https://issues.apache.org/jira/projects/SPARK and attach the
JSON files there (or samples or code which generates JSON files)?
Maxim Gekk
Software Engineer
Databricks, Inc.
On Mon, Jun
It has read everything. As you notice the timing of count is still smaller
in Spark 2.4
Spark 2.4
scala> spark.time(spark.read.json("/data/20200528"))
Time taken: 19691 ms
res61: org.apache.spark.sql.DataFrame = [created: bigint, id: string ... 5
more fields]
scala> spark.time(res61.count())
Could you share your code? Are you sure you Spark 2.4 cluster had
indeed read anything? Looks like the Input size field is empty under 2.4.
-- ND
On 6/27/20 7:58 PM, Sanjeev Mishra wrote:
I have large amount of json files that Spark can read in 36 seconds
but Spark 3.0 takes almost 33
Hi,
I am doing repartition at the end. I mean before insert overwriting the
table. I see the last step (repartition) is taking more time.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe
While launching a spark job from Zeppelin against a standalone spark
cluster (Spark 3.0 with multiple workers without hadoop), we have
encountered a Spark interpreter exception caused by a I/O File Not Found
exception due to the non-existence of the /tmp/spark-events directory.
We had to
There is not much code, I am using spark-shell provided by Spark 2.4 and
Spark 3.
val dp = spark.read.json("/Users//data/dailyparams/20200528")
On Mon, Jun 29, 2020 at 2:25 AM Gourav Sengupta
wrote:
> Hi,
>
> can you please share the SPARK code?
>
>
>
> Regards,
> Gourav
>
> On Sun, Jun 28,
Hi, Apache enthusiast!
(You’re receiving this because you’re subscribed to one or more dev or
user mailing lists for an Apache Software Foundation project.)
The ApacheCon Planners and the Apache Software Foundation are pleased to
announce that ApacheCon @Home will be held online, September
you are going to need hadoop-3.1 on your classpath, with hadoop-aws and the
same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
doomed. using a different aws sdk jar is a bit risky, though more recent
upgrades have all be fairly low stress
On Fri, 19 Jun 2020 at 05:39, murat
Hello,
Adding the dev mailing list maybe there is someone here that can help to
have/show a valid/accepted pod template for spark 3?
Thanks in advance,
Michel
Le ven. 26 juin 2020 à 14:03, Michel Sumbul a
écrit :
> Hi Jorge,
> If I set that in the spark submit command it works but I want it
Hi,
can you please share the SPARK code?
Regards,
Gourav
On Sun, Jun 28, 2020 at 12:58 AM Sanjeev Mishra
wrote:
>
> I have large amount of json files that Spark can read in 36 seconds but
> Spark 3.0 takes almost 33 minutes to read the same. On closer analysis,
> looks like Spark 3.0 is
17 matches
Mail list logo