It seems that JavaSparkContext is just a wrapper of scala sparkContext.
In JavaSparkContext, the scala one is used to do all the job.
If I pass the same scala sparkContext to initialize JavaSparkContext, I
still manipulate on the same sparkContext.
Sry for spamming.
Hao
On Mon, Jun 29, 2015
I'd like to toss out another idea that doesn't involve a complete end-to-end
Kerberos implementation. Essentially, have the driver authenticate to
Kerberos, instantiate a Hadoop file system, and serialize/cache it for the
executors to use instead of them having to instantiate their own.
-
Hi, Akhil. Thank you for your reply. I tried what you suggested. But it
exists the following error.
source code is:
JavaPairRDDLongWritable,Text distFile=sc.hadoopFile(
hdfs://cMaster:9000/wcinput/data.txt,
DataInputFormat.class,LongWritable.class,Text.class);
while DataInputFormat class is
Hey all,
I try to make a DataFrame by inspection (using Spark 1.4.0), but run into a
parameter of my case class not being supported. Minimal example:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import com.vividsolutions.jts.geom.Coordinate
case class
You may want to store the propertyfile either locally or ifyour intend to
launch it in yarn modes then in HDFS (asyou do notknow which node wil
become your AM)
On Mon, Jun 29, 2015 at 10:51 PM, diplomatic Guru diplomaticg...@gmail.com
wrote:
I want to store the Spark application arguments such
Hi,
Any {fan-out - process in parallel - fan-in - aggregate} pattern of data
flow can be conceptually Map-Reduce(MR, as it is done in Hadoop).
Apart from the bigger list of map, reduce, sort, filter, pipe, join,
combine,... functions, that are many times more efficient and productive for
Thanks TD, this helps.
Looking forward to some fix where framework handles the batch failures by some
callback methods. This will help not having to write try/catch in every
transformation / action.
Regards,
Amit
From: Tathagata Das t...@databricks.commailto:t...@databricks.com
Date:
Hi,
I'm trying to run Spark without Hadoop where the data would be read and
written to local disk.
For this I have few Questions -
1. Which download I need to use ? In the download option I don't see any
binary download which does not need Hadoop. Is the only way to do this to
download the
Currently, I am considering to use Guava Suppliers for delayed
initialization in workers
SupplierT supplier = (Serializable SupplierT) () - new T();
SupplierT singleton = Suppliers.memoize(supplier);
On 26 June 2015 at 13:17, Igor Berman igor.ber...@gmail.com wrote:
asked myself same
3. You need to use your own method, because you need to set up your job.
Read the checkpoint documentation.
4. Yes, if you want to checkpoint, you need to specify a url to store the
checkpoint at (s3 or hdfs). Yes, for the direct stream checkpoint it's
just offsets, not all the messages.
On
Hi
Let me take ashot at your questions. (I am sure people like Cody and TD
will correct if I am wrong)
0. This is exact copy from the similar question in mail thread from Akhil D:
Since you set local[4] you will have 4 threads for your computation, and
since you are having 2 receivers, you are
Hi there,
I have some traces from my master and some workers where for some reason,
the ./work directory of an application can not be created on the workers.
There is also an issue with the master's temp directory creation.
master logs: http://pastebin.com/v3NCzm0u
worker's logs:
I want to store the Spark application arguments such as input file, output
file into a Java property files and pass that file into Spark Driver. I'm
using spark-submit for submitting the job but couldn't find a parameter to
pass the properties file. Have you got any suggestions?
No, spark can not do that as it does not replicate partitions (so no retry
on different worker). It seems your cluster is not provisioned with correct
permissions. I would suggest to automate node provisioning.
On Mon, Jun 29, 2015 at 11:04 PM, maxdml maxdemou...@gmail.com wrote:
Hi there,
I
see also:
https://github.com/apache/spark/pull/6848
On Mon, Jun 29, 2015 at 12:48 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
sc.hadoopConfiguration.set(mapreduce.input.fileinputformat.split.maxsize,
67108864)
sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith(_)).get
+ /*,
Hi
Have you tested the Cloudera project:
https://github.com/cloudera/spark-timeseries ?
Let me know how did you progress on that route as I am also interested in
that topic ?
Cheers
On 26 June 2015 at 14:07, Caio Cesar Trucolo truc...@gmail.com wrote:
Hi everyone!
I am working with
Hi Matt,
is there a reason you need to call coalesce every loop iteration? Most
likely it forces spark to do lots of unnecessary shuffles. Also - for
really large number of inputs this approach can lead to due to to many
nested RDD.union calls. A safer approach is to call union from
I can't see an obvious problem. Could you post the full minimal code that
reproduces the problem? Also why version of Spark and Scala are you using?
--
View this message in context:
Hi there
I am running 30 APPs in my spark cluster, and some of the APPs got
exception like below:[root@slave3 0]# cat stderr
15/06/29 17:20:08 INFO executor.CoarseGrainedExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
15/06/29 17:20:09 WARN util.NativeCodeLoader: Unable
Hi, I'm using spark streaming to process data. I do a simple flatMap on each
record as follows
package bb;
import java.io.*;
import java.net.*;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import java.util.List;
import
Hi Akhil,
Thanks for your reply.
Here it is
Launch Command: /usr/lib/jvm/java-7-oracle/jre/bin/java -cp
/etc/spark/:/opt/spark/lib/spark-assembly-1.4.0-hadoop2.3.0.jar:/opt/spark-1.4.0-bin-hadoop2.3/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.4.0-bin
For prototyping purposes, I created a test program injecting dependancies using
Spring.
Nothing fancy. This is just a re-write of KafkaDirectWordCount. When I run
this, I get the following exception:
Exception in thread main org.apache.spark.SparkException: Task not
serializable
at
Thanks, Steve--I should have tested out this theory before spamming the list.
I haven't been able to get anything working after testing this theory out.
I'll hit up the Spark dev mailing list and try to garner enough interest to get
some Jira's cut.
I really appreciate everyone's feedback,
Hi all,
What is the best way to remotely debug, with breakpoints, spark apps?
Thanks in advance,
Best regards!
Pietro
Sourav:
Please see https://spark.apache.org/docs/latest/spark-standalone.html
Cheers
On Mon, Jun 29, 2015 at 7:33 AM, ayan guha guha.a...@gmail.com wrote:
Hi
You really donot need hadoop installation. You can dowsload a pre-built
version with any hadoop and unzip it and you are good to go.
1. Here you are basically creating 2 receivers and asking each of them to
consume 3 kafka partitions each.
- In 1.2 we have high level consumers so how can we restrict no of kafka
partitions to consume from? Say I have 300 kafka partitions in kafka topic
and as in above I gave 2 receivers and 3
Hi
You really donot need hadoop installation. You can dowsload a pre-built
version with any hadoop and unzip it and you are good to go. Yes it may
complain while launching master and workers, safely ignore them. The only
problem is while writing to a directory. Of course you will not be able to
Also, how do you suggest catching exceptions while using with connector API
like, saveAsNewAPIHadoopFiles ?
From: amit assudani aassud...@impetus.commailto:aassud...@impetus.com
Date: Monday, June 29, 2015 at 9:55 AM
To: Tathagata Das t...@databricks.commailto:t...@databricks.com
Cc: Cody
The underlying issue is a filesystem corruption on the workers.
In the case where I use hdfs, with a sufficient amount of replica, would
Spark try to launch a task on another node where the block replica is
present?
Thanks :-)
--
Henri Maxime Demoulin
2015-06-29 9:10 GMT-04:00 ayan guha
On 29 Jun 2015, at 14:18, Dave Ariens
dari...@blackberry.commailto:dari...@blackberry.com wrote:
I'd like to toss out another idea that doesn't involve a complete end-to-end
Kerberos implementation. Essentially, have the driver authenticate to
Kerberos, instantiate a Hadoop file system, and
I'm running a query from the BigDataBenchmark, query 1B to be precise.
When running this with Spark (1.3.1)+ mesos(0.21) in coarse grained mode
with 5 mesos slave, through a spark shell, all is well.
However rerunning the query a few times:
scala sqlContext.sql(SELECT pageURL, pageRank FROM
No. He is collecting the results of the SQL query, not the whole dataset.
The REPL does retain references to prior results, so it's not really the
best tool to be using when you want no-longer-needed results to be
automatically garbage collected.
On Mon, Jun 29, 2015 at 9:13 AM, ayan guha
It's a scheduler question. Spark will retry the task on the same worker.
From spark standpoint data is not replicated because spark provides fault
tolerance but lineage not by replication.
On 30 Jun 2015 01:50, Max Demoulin maxdemou...@gmail.com wrote:
The underlying issue is a filesystem
I see. Thank you for your help!
--
Henri Maxime Demoulin
2015-06-29 11:57 GMT-04:00 ayan guha guha.a...@gmail.com:
It's a scheduler question. Spark will retry the task on the same worker.
From spark standpoint data is not replicated because spark provides fault
tolerance but lineage not by
Actually, Hadoop InputFormats can still be used to read and write from
file://, s3n://, and similar schemes. You just won't be able to
read/write to HDFS without installing Hadoop and setting up an HDFS cluster.
To summarize: Sourav, you can use any of the prebuilt packages (i.e.
anything other
Please read:
http://search-hadoop.com/m/q3RTtig4WhHTdMa1
FYI
On Mon, Jun 29, 2015 at 8:37 AM, Pietro Gentile
pietro.gentile89.develo...@gmail.com wrote:
Hi all,
What is the best way to remotely debug, with breakpoints, spark apps?
Thanks in advance,
Best regards!
Pietro
Hello,
I noticed that some of the spark-core APIs are not available with version
1.4.0 release of SparkR. For example textFile(), flatMap() etc. The code
seems to be there but is not exported in NAMESPACE. They were all available
as part of the AmpLab Extras previously. I wasn't able to find any
Cool.
On 29 Jun 2015 21:10, 郭谦 buptguoq...@gmail.com wrote:
Akhil Das,
You give me a new idea to solve the problem.
Vova provides me a way to solve the problem just before
Vova Shelgunovvvs...@gmail.com
Sample code for submitting job from any other java app, e.g. servlet:
When you call collect, you are bringing whole dataset back to driver
memory.
On 30 Jun 2015 01:43, hbogert hansbog...@gmail.com wrote:
I'm running a query from the BigDataBenchmark, query 1B to be precise.
When running this with Spark (1.3.1)+ mesos(0.21) in coarse grained mode
with 5 mesos
Hi All ,
What is the best possible way to load multiple data tables using spark sql
MapString, String options = new HashMap();
options.put(driver, MYSQLDR);
options.put(url, MYSQL_CN_URL);
options.put(dbtable,(select * from courses);
*can i add multiple tables to options map
Would there be a way to force the 'old' data out? Because at this point
I'll have to restart the shell every couple of queries to get meaningful
timings which are comparable to spark submit .
On Jun 29, 2015 6:20 PM, Mark Hamstra m...@clearstorydata.com wrote:
No. He is collecting the results
Hi All
here goes my first question :
Here is my use case
I have 1TB data I want to process on ec2 using spark
I have uploaded the data on ebs volume
The instruction on amazon ec2 set up explains
*If your application needs to access large datasets, the fastest way to do
that is to load them from
Hi Haviv,
have you tried sc.broadcast(model), the broadcast method is a member of
sparkContext class.
Thanks
Himanshu
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/kmeans-broadcast-tp23511p23526.html
Sent from the Apache Spark User List mailing list
Pls check your ACL properties.
On Monday, June 29, 2015 11:29 AM, didi did...@gmail.com wrote:
Hi
*Cant read text file from s3 to create RDD
*
after setting the configuration
val hadoopConf=sparkContext.hadoopConfiguration;
hadoopConf.set(fs.s3.impl,
I'm having trouble using select pow(col) from table It seems the function
is not registered for SparkSQL. Is this on purpose or an oversight? I'm
using pyspark.
Hi
*Cant read text file from s3 to create RDD
*
after setting the configuration
val hadoopConf=sparkContext.hadoopConfiguration;
hadoopConf.set(fs.s3.impl,
org.apache.hadoop.fs.s3native.NativeS3FileSystem)
hadoopConf.set(fs.s3.awsAccessKeyId,yourAccessKey)
To my knowledge this is not supported.
On Mon, Jun 29, 2015 at 10:47 AM, Ashish Soni asoni.le...@gmail.com wrote:
Hi All ,
What is the best possible way to load multiple data tables using spark sql
MapString, String options = new HashMap();
options.put(driver, MYSQLDR);
options.put(url,
The RDD API is pretty complex and we are not yet sure we want to export all
those methods in the SparkR API. We are working towards exposing a more
limited API in upcoming versions. You can find some more details in the
recent Spark Summit talk at
In pyspark, when I convert from rdds to dataframes it looks like the rdd is
being materialized/collected/repartitioned before it's converted to a
dataframe.
Just wondering if there's any guidelines for doing this conversion and
whether it's best to do it early to get the performance benefits of
Hi Bob,I tested your scenario with Spark 1.3 and I assumed you did not miss the
second parameter of pow(x,y)
from pyspark.sql import SQLContextsqlContext = SQLContext(sc)
df = sqlContext.jsonFile(/vagrant/people.json)# Displays the content of the
DataFrame to stdoutdf.show()#These are all
I have a join + leftOuterJoin + reduceByKey.
the join operation worked but the left outer join failed.
There are million log lines and when i did this
~ dvasthimal$ cat ~/Desktop/errors | grep ERROR | grep -v
client.TransportResponseHandler | grep -v shuffle.RetryingBlockFetcher
| grep -v
Hi All,
While using Checkpoints ( using HDFS ), if connectivity to hadoop cluster is
lost for a while and gets restored in some time, what happens to the running
streaming job.
Is it always assumed that connection to checkpoint FS ( this case HDFS ) would
ALWAYS be HA and would never fail for
1.4 and I did set the second parameter. The DSL works fine but trying out
with SQL doesn't.
On Mon, Jun 29, 2015, 4:32 PM Salih Oztop soz...@yahoo.com wrote:
Hi Bob,
I tested your scenario with Spark 1.3 and I assumed you did not miss the
second parameter of pow(x,y)
from pyspark.sql import
Interesting. Looking at the definitions, sql.functions.pow is defined only
for (col,col). Just as an experiment, create a column with value 2 and see
if that works.
Cheers
k/
On Mon, Jun 29, 2015 at 1:34 PM, Bob Corsaro rcors...@gmail.com wrote:
1.4 and I did set the second parameter. The DSL
I recommend writing using dstream.foreachRDD, and then
rdd.saveAsNewAPIHadoopFile inside try catch. See the implementation of
dstream.saveAsNewAPIHadoopFiles
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L716
On
Hi,
not sure what the context is but I think you can do something similar with
mapPartitions:
rdd.mapPartitions { iterator =
iterator.grouped(5).map { tupleGroup = emitOneRddForGroup(tupleGroup) }
}
The edge case is when the final grouping doesn't have exactly 5 items, if
that matters.
On
My error was related to Scala version. Upon further reading, I realized
that it takes some effort to get Spark working with Scala 2.11.
I've reverted to using 2.10 and moved past that error. Now I hit the issue
you mentioned. Waiting for 1.4.1.
Srikanth
On Fri, Jun 26, 2015 at 9:10 AM, Roberto
Hitested wih Spark 1.4
We need to import pow otherwise it uses python version of pow I guess.
from pyspark.sql.functions import pow
df.select(pow(df.age,df.age)).show()
15/06/29 22:36:05 INFO Ta++| POWER(age,
age)|++| null||
My job has multiple stages, each time a stage fails i have to restart the
entire app.
I understand Spark restarts failed tasks.
However, Is there a way to restart a Spark app from failed stage ?
--
Deepak
I want to apply some logic on the basis of a FIX count of number of
tuples in each RDD . *suppose emit one rdd for every 5 tuple of previous
RDD . *
--
Thanks Regards,
Anshu Shukla
HI Jey,
Not much of luck.
If I use the class com.databricks:spark-csv_2.
11:1.1.0 or com.databricks.spark.csv_2.11.1.1.0 I get class not found
error. With com.databricks.spark.csv I don't get the class not found error
but I still get the previous error even after using file:/// in the URI.
I get the same error even when I define covOperator not to use a matrix at
all:
def covOperator(v : BDV[Double]) :BDV[Double] = { v }
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/breeze-linalg-DenseMatrix-not-found-tp23537p23538.html
Sent from the
Hi Sourav,
The error seems to be caused by the fact that your URL starts with
file:// instead of file:///.
Also, I believe the current version of the package for Spark 1.4 with Scala
2.11 should be com.databricks:spark-csv_2.11:1.1.0.
-Jey
On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder
I'm trying to compute the eigendecomposition of a matrix in a portion of my
code, using mllib.linalg.EigenValueDecomposition
(https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala
)
as follows:
val tol = 1e-10
val maxIter
on using yarn-cluster, it works good
On Mon, Jun 29, 2015 at 12:07 PM, ram kumar ramkumarro...@gmail.com wrote:
SPARK_CLASSPATH=$CLASSPATH:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/*
in spark-env.sh
I think i am facing the same issue
https://issues.apache.org/jira/browse/SPARK-6203
On Mon,
Here's a bunch of configuration for that
https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior
Thanks
Best Regards
On Fri, Jun 26, 2015 at 10:37 PM, igor.berman igor.ber...@gmail.com wrote:
Hi,
wanted to get some advice regarding tunning spark application
I see for some of
Which version of spark are you using? You can try changing the heap size
manually by *export _JAVA_OPTIONS=-Xmx5g *
Thanks
Best Regards
On Fri, Jun 26, 2015 at 7:52 PM, Yifan LI iamyifa...@gmail.com wrote:
Hi,
I just encountered the same problem, when I run a PageRank program which
has lots
You can create a SparkContext in your program and run it as a standalone
application without using spark-submit.
Here's something that will get you started:
//Create SparkContext
val sconf = new SparkConf()
.setMaster(spark://spark-ak-master:7077)
.setAppName(Test)
Hi,
I am working on legacy project using spark java code.
I have a function which takes sqlContext as an argument, however, I need a
JavaSparkContext in that function.
It seems that sqlContext.sparkContext() return a scala sparkContext.
I did not find any API for casting a scala sparkContext
Hi,
As per my use case I need to submit multiple queries to Spark SQL in
parallel but due to HiveContext being thread safe the jobs are getting
submitted sequentially.
I could see many threads are waiting for HiveContext.
on-spray-can-akka.actor.default-dispatcher-26 - Thread t@149
SPARK_CLASSPATH=$CLASSPATH:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/*
in spark-env.sh
I think i am facing the same issue
https://issues.apache.org/jira/browse/SPARK-6203
On Mon, Jun 29, 2015 at 11:38 AM, ram kumar ramkumarro...@gmail.com wrote:
I am using Spark 1.2.0.2.2.0.0-82 (git revision
Hello,
It is my understanding that shuffle are written on disk and that they act
as checkpoints.
I wonder if this is true only within a job, or across jobs. Please note
that I use the words job and stage carefully here.
1. can a shuffle created during JobN be used to skip many stages from
Ah, for #3, maybe this is what *rdd.checkpoint *does!
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
Thomas
On Mon, Jun 29, 2015 at 7:12 PM, Thomas Gerber thomas.ger...@radius.com
wrote:
Hello,
It is my understanding that shuffle are written on disk and
Thanks Silvio.
On Mon, Jun 29, 2015 at 7:41 PM, Silvio Fiorito
silvio.fior...@granturing.com wrote:
Regarding 1 and 2, yes shuffle output is stored on the worker local
disks and will be reused across jobs as long as they’re available. You can
identify when they’re used by seeing skipped
Yes, the observation is correct. That connectivity is assumed to be HA.
On Mon, Jun 29, 2015 at 2:34 PM, Amit Assudani aassud...@impetus.com
wrote:
Hi All,
While using Checkpoints ( using HDFS ), if connectivity to hadoop
cluster is lost for a while and gets restored in some time, what
Hi Jey,
This solves the class not found problem. Thanks.
But still the inputs format is not yet resolved. Looks like it is still
trying to create a HadoopRDD I don't know why. The error message goes like -
java.lang.RuntimeException: Error in configuring object
at
The format is still com.databricks.spark.csv, but the parameter passed to
spark-shell is --packages com.databricks:spark-csv_2.11:1.1.0.
On Mon, Jun 29, 2015 at 2:59 PM, Sourav Mazumder
sourav.mazumde...@gmail.com wrote:
HI Jey,
Not much of luck.
If I use the class
It seems the root cause of the delay was the sheer size of the DAG for
those jobs, which are towards the end of a long series of jobs.
To reduce it, you can probably try to checkpoint (rdd.checkpoint) some
previous RDDs. That will:
1. save the RDD on disk
2. remove all references to the parents
Regarding 1 and 2, yes shuffle output is stored on the worker local disks and
will be reused across jobs as long as they’re available. You can identify when
they’re used by seeing skipped stages in the job UI. They are periodically
cleaned up based on available space of the configured
I am running a spark streaming example from learning spark book with one
change. The change I made was for streaming a file from HDFS.
val lines = ssc.textFileStream(hdfs:/user/hadoop/spark/streaming/input)
I ran the application number of times and every time dropped a new file in
the input
Order the tasks by status and see if there are any with status failed.
On Mon, 29 Jun 2015 at 2:26 pm ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Attached pic shows the error that is displayed in Spark UI.
On Mon, Jun 29, 2015 at 2:22 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
wrote:
I have a join
All InputFormats will use HadoopRDD or NewHadoopRDD. Do you use file:///
instead of file://?
On Mon, Jun 29, 2015 at 8:40 PM, Sourav Mazumder
sourav.mazumde...@gmail.com wrote:
Hi Jey,
This solves the class not found problem. Thanks.
But still the inputs format is not yet resolved. Looks
Hi Folks,
I just stepped up from 1.3.1 to 1.4.0, the most notable difference for me so
far is the data frame reader/writer. Previously:
val myData = hiveContext.load(s3n://someBucket/somePath/,parquet)
Now:
val myData = hiveContext.read.parquet(s3n://someBucket/somePath)
Using the original
There are many with failed. Attached pic shows exception traces.
On Mon, Jun 29, 2015 at 9:07 PM, Matthew Jones mle...@gmail.com wrote:
Order the tasks by status and see if there are any with status failed.
On Mon, 29 Jun 2015 at 2:26 pm ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Attached pic
I can't see any failed tasks in the attached pic.
On Mon, Jun 29, 2015 at 9:31 PM ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
There are many with failed. Attached pic shows exception traces.
On Mon, Jun 29, 2015 at 9:07 PM, Matthew Jones mle...@gmail.com wrote:
Order the tasks by status and
Facing following error message while performing sbt/sbt assembly
Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
pls give me solution for it
--
Hey Spark Users,
I'm writing a demo with Spark and HBase. What I've done is packaging a
**fat jar**: place dependencies in `build.sbt`, and use `sbt assembly` to
package **all dependencies** into one big jar. The rest work is copy the
fat jar to Spark master node and then launch by
Hi Jey,
Thanks for your inputs.
Probably I'm getting error as I'm trying to read a csv file from local file
using com.databricks.spark.csv package. Probably this package has hard
coded dependency on Hadoop as it is trying to read input format from
HadoopRDD.
Can you please confirm ?
Here is
88 matches
Mail list logo