The issue seems to be with primordial class loader. I cannot load the
drivers to all the nodes at the same location but have loaded the jars to
HDFS. I have tried SPARK_YARN_DIST_FILES as well as SPARK_CLASSPATH on the
edge node with no luck. Is there another way to load these jars
through primord
Are you using Spark's textFiles method? If so, go through this blog :-
http://tech.kinja.com/how-not-to-pull-from-s3-using-apache-spark-1704509219
Anubhav
On Mon, Apr 24, 2017 at 12:48 PM, Afshin, Bardia <
bardia.afs...@capitalone.com> wrote:
> Hi there,
>
>
>
> I have a process that downloads t
r.marksuccessfuljobs",
> "false")
>
>
> Regards,
> Chanh
>
>
> On Oct 6, 2016, at 10:32 PM, Anubhav Agarwal wrote:
>
> Hi all,
> I have searched a bit before posting this query.
>
> Using Spark 1.6.1
> Dataframe.write().format("parquet
Hi all,
I have searched a bit before posting this query.
Using Spark 1.6.1
Dataframe.write().format("parquet").mode(SaveMode.Append).save("location)
Note:- The data in that folder can be deleted and most of the times that
folder doesn't even exist.
Which Savemode is the best, if necessary at all
Hi,
I am having log4j trouble while running Spark using YARN as cluster manager
in CDH 5.3.3.
I get the following error:-
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/data/12/yarn/nm/filecache/34/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticL
Hi,
We have a very small 12 mb file that we join with other data. The job runs
fine and save the data as a parquet file. But if we use coalesce(1) we get
the following error:-
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAnd
.
On Tue, Nov 3, 2015 at 7:48 AM, Ted Yu wrote:
> I am a bit curious: why is the synchronization on finalLock is needed ?
>
> Thanks
>
> On Oct 23, 2015, at 8:25 AM, Anubhav Agarwal wrote:
>
> I have a spark job that creates 6 million rows in RDDs. I convert the RDD
> into
I have a spark job that creates 6 million rows in RDDs. I convert the RDD
into Data-frame and write it to HDFS. Currently it takes 3 minutes to write
it to HDFS.
Here is the snippet:-
RDDList.parallelStream().forEach(mapJavaRDD -> {
if (mapJavaRDD != null) {
Hi Ankit,
Here is my solution for this:-
1) Download the latest Spark 1.5.1(Just copied the following link from
spark.apache.org, if it doesn't work then gran a new one from the website.)
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
2) Unzip the folder and rename/move t
I am running Spark 1.3 on CDH 5.4 stack. I am getting the following error
when I spark-submit my application:-
15/08/11 16:03:49 INFO Remoting: Starting remoting
15/08/11 16:03:49 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkdri...@cdh54-22a4101a-14d7-4f06-b3f8-079c6f
d was that there were two addAccumulator()
> calls at the top of stack trace while in your code I don't
> see addAccumulator() calling itself.
>
> FYI
>
> On Mon, Aug 3, 2015 at 3:22 PM, Anubhav Agarwal
> wrote:
>
>> The code was written in 1.4
>
> Cheers
>
> On Mon, Aug 3, 2015 at 3:13 PM, Anubhav Agarwal
> wrote:
>
>> Hi,
>> I am trying to modify my code to use HDFS and multiple nodes. The code
>> works fine when I run it locally in a single machine with a single worker.
>> I have been trying t
Hi,
I am trying to modify my code to use HDFS and multiple nodes. The code
works fine when I run it locally in a single machine with a single worker.
I have been trying to modify it and I get the following error. Any hint
would be helpful.
java.lang.NullPointerException
at
thomsonreuters.
Zhan specifying port fixed the port issue.
Is it possible to specify the log directory while starting the spark
thriftserver?
Still getting this error even through the folder exists and everyone has
permission to use that directory.
drwxr-xr-x 2 root root 4096 Mar 24 19:04 spark-
14 matches
Mail list logo