from:"Mahesh Sawaiker"

RE: Equivalent of Redshift ListAgg function in Spark (Pyspak)

2017-10-09 Thread Mahesh Sawaiker

After doing group, you can use mkstring on the data frame. Following is an example where are columns are concatenated with space as a separator. scala> call_cdf.map(row => row.mkString(" ")).show(false)

RE: SPARK Issue in Standalone cluster

2017-07-31 Thread Mahesh Sawaiker

Gourav, Riccardo’s answer is spot on. What is happening is one node of spark is writing to its own directory and telling a slave to read the data from there, when the slave goes to read it, the part is not found. Check the folder

UI for spark machine learning.

2017-07-09 Thread Mahesh Sawaiker

Hi, 1) Is anyone aware of any workbench kind of tool to run ML jobs in spark. Specifically is the tool could be something like a Web application that is configured to connect to a spark cluster. User is able to select input training sets probably from hdfs , train and then run predictions,

RE: PySpark working with Generators

2017-06-29 Thread Mahesh Sawaiker

Wouldn’t this work if you load the files in hdfs and let the partitions be equal to the amount of parallelism you want? From: Saatvik Shah [mailto:saatvikshah1...@gmail.com] Sent: Friday, June 30, 2017 8:55 AM To: ayan guha Cc: user Subject: Re: PySpark working with Generators Hey Ayan, This

RE: using Apache Spark standalone on a server for a class/multiple users, db.lck does not get removed

2017-06-29 Thread Mahesh Sawaiker

You could copy the spark folder to home directory of each user and set a different Spark home for each one..not sure what derby is used for, but you could try using mysql instead(if its for the hive metastore) From: Robert Kudyba [mailto:rkud...@fordham.edu] Sent: Wednesday, June 28, 2017 8:25

RE: HDP 2.5 - Python - Spark-On-Hbase

2017-06-25 Thread Mahesh Sawaiker

Ayan, The location of the logging class was moved from Spark 1.6 to Spark 2.0. Looks like you are trying to run 1.6 code on 2.0, I have ported some code like this before and if you have access to the code you can recompile it by changing reference to Logging class and directly use the slf4

RE: JDBC RDD Timestamp Parsing Issue

2017-06-21 Thread Mahesh Sawaiker

This has to do with how you are creating the timestamp object from the resultset ( I guess). If you can provide more code it will help, but you could surround the parsing code with a try catch and then just ignore the exception. From: Aviral Agarwal [mailto:aviral12...@gmail.com] Sent:

RE: Using Spark as a simulator

2017-06-21 Thread Mahesh Sawaiker

Spark can help you to create one large file if needed, but hdfs itself will provide abstraction over such things, so it's a trivial problem if anything. If you have spark installed, then you can use spark-shell to try a few samples, and build from there.If you can collect all the files in a

RE: Using Spark as a simulator

2017-06-20 Thread Mahesh Sawaiker

I have already seen on example where data is generated using spark, no reason to think it's a bad idea as far as I know. You can check the code here, I m not very sure but I think there is something there which generates data for the TPCDS benchmark and you can provide how much data you want in

RE: What is the charting library used by Databricks UI?

2017-06-16 Thread Mahesh Sawaiker

Is there a live url on internet, where I can see the UI? I could help by checking the js code in firebug. From: kant kodali [mailto:kanth...@gmail.com] Sent: Friday, June 16, 2017 1:26 PM To: user @spark Subject: What is the charting library used by Databricks UI? Hi All, I am wondering what

RE: The following Error seems to happen once in every ten minutes (Spark Structured Streaming)?

2017-05-31 Thread Mahesh Sawaiker

Your data node(s) is/are going down for some reason, check the logs of the datanode and fix the underlying issue why datanode is going down. There should be no need to delete any data, just starting the data nodes should do the trick for you. From: kant kodali [mailto:kanth...@gmail.com] Sent:

RE: Spark sql with Zeppelin, Task not serializable error when I try to cache the spark sql table

2017-05-31 Thread Mahesh Sawaiker

It’s because the class in which you have defined the udf is not serializable. Declare the udf in a class and make the class seriablizable. From: shyla deshpande [mailto:deshpandesh...@gmail.com] Sent: Thursday, June 01, 2017 10:08 AM To: user Subject: Spark sql with Zeppelin, Task not

RE: Equivalent of Redshift ListAgg function in Spark (Pyspak)

RE: SPARK Issue in Standalone cluster

UI for spark machine learning.

RE: PySpark working with Generators

RE: using Apache Spark standalone on a server for a class/multiple users, db.lck does not get removed

RE: HDP 2.5 - Python - Spark-On-Hbase

RE: JDBC RDD Timestamp Parsing Issue

RE: Using Spark as a simulator

RE: Using Spark as a simulator

RE: What is the charting library used by Databricks UI?

RE: The following Error seems to happen once in every ten minutes (Spark Structured Streaming)?

RE: Spark sql with Zeppelin, Task not serializable error when I try to cache the spark sql table

12 matches

Site Navigation

Mail list logo

Footer information