After doing group, you can use mkstring on the data frame. Following is an
example where are columns are concatenated with space as a separator.
scala> call_cdf.map(row => row.mkString(" ")).show(false)
Gourav,
Riccardo’s answer is spot on.
What is happening is one node of spark is writing to its own directory and
telling a slave to read the data from there, when the slave goes to read it,
the part is not found.
Check the folder
Hi,
1) Is anyone aware of any workbench kind of tool to run ML jobs in spark.
Specifically is the tool could be something like a Web application that is
configured to connect to a spark cluster.
User is able to select input training sets probably from hdfs , train and then
run predictions,
Wouldn’t this work if you load the files in hdfs and let the partitions be
equal to the amount of parallelism you want?
From: Saatvik Shah [mailto:saatvikshah1...@gmail.com]
Sent: Friday, June 30, 2017 8:55 AM
To: ayan guha
Cc: user
Subject: Re: PySpark working with Generators
Hey Ayan,
This
You could copy the spark folder to home directory of each user and set a
different Spark home for each one..not sure what derby is used for, but you
could try using mysql instead(if its for the hive metastore)
From: Robert Kudyba [mailto:rkud...@fordham.edu]
Sent: Wednesday, June 28, 2017 8:25
Ayan,
The location of the logging class was moved from Spark 1.6 to Spark 2.0.
Looks like you are trying to run 1.6 code on 2.0, I have ported some code like
this before and if you have access to the code you can recompile it by changing
reference to Logging class and directly use the slf4
This has to do with how you are creating the timestamp object from the
resultset ( I guess).
If you can provide more code it will help, but you could surround the parsing
code with a try catch and then just ignore the exception.
From: Aviral Agarwal [mailto:aviral12...@gmail.com]
Sent:
Spark can help you to create one large file if needed, but hdfs itself will
provide abstraction over such things, so it's a trivial problem if anything.
If you have spark installed, then you can use spark-shell to try a few samples,
and build from there.If you can collect all the files in a
I have already seen on example where data is generated using spark, no reason
to think it's a bad idea as far as I know.
You can check the code here, I m not very sure but I think there is something
there which generates data for the TPCDS benchmark and you can provide how much
data you want in
Is there a live url on internet, where I can see the UI? I could help by
checking the js code in firebug.
From: kant kodali [mailto:kanth...@gmail.com]
Sent: Friday, June 16, 2017 1:26 PM
To: user @spark
Subject: What is the charting library used by Databricks UI?
Hi All,
I am wondering what
Your data node(s) is/are going down for some reason, check the logs of the
datanode and fix the underlying issue why datanode is going down.
There should be no need to delete any data, just starting the data nodes should
do the trick for you.
From: kant kodali [mailto:kanth...@gmail.com]
Sent:
It’s because the class in which you have defined the udf is not serializable.
Declare the udf in a class and make the class seriablizable.
From: shyla deshpande [mailto:deshpandesh...@gmail.com]
Sent: Thursday, June 01, 2017 10:08 AM
To: user
Subject: Spark sql with Zeppelin, Task not
12 matches
Mail list logo