Spark Metrics: custom source/sink configurations not getting recognized

2016-09-05 Thread map reduced
Hi, I've written my custom metrics source/sink for my Spark streaming app and I am trying to initialize it from metrics.properties - but that doesn't work from executors. I don't have control on the machines in Spark cluster, so I can't copy properties file in $SPARK_HOME/conf/ in the cluster. I

Re: Any estimate for a Spark 2.0.1 release date?

2016-09-05 Thread Takeshi Yamamuro
Hi, Have you seen this? // maropu On Tue, Sep 6, 2016 at 7:42 AM, mhornbech wrote: > I can't find any JIRA issues with the tag that are unresolved. Apologies if > this is a rookie mistake and the information is available elsewhere. > > Morten > > > > -- > View this

Re: Spark 2.0.0 Thrift Server problem with Hive metastore

2016-09-05 Thread Jeff Zhang
How do you upgrade to spark 2.0 ? On Mon, Sep 5, 2016 at 11:25 PM, Campagnola, Francesco < francesco.campagn...@anritsu.com> wrote: > Hi, > > > > in an already working Spark - Hive environment with Spark 1.6 and Hive > 1.2.1, with Hive metastore configured on Postgres DB, I have upgraded Spark >

Re: Scala Vs Python

2016-09-05 Thread Luciano Resende
On Thu, Sep 1, 2016 at 3:15 PM, darren wrote: > This topic is a concern for us as well. In the data science world no one > uses native scala or java by choice. It's R and Python. And python is > growing. Yet in spark, python is 3rd in line for feature support, if at all. > >

Any estimate for a Spark 2.0.1 release date?

2016-09-05 Thread mhornbech
I can't find any JIRA issues with the tag that are unresolved. Apologies if this is a rookie mistake and the information is available elsewhere. Morten -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Any-estimate-for-a-Spark-2-0-1-release-date-tp27659.html

Re: Splitting columns from a text file

2016-09-05 Thread Gourav Sengupta
just use SPARK CSV, all other ways of splitting and working is just trying to reinvent the wheel and a magnanimous waste of time. Regards, Gourav On Mon, Sep 5, 2016 at 1:48 PM, Ashok Kumar wrote: > Hi, > > I have a text file as below that I read in > >

Re: Scala Vs Python

2016-09-05 Thread Gourav Sengupta
The pertinent question is between "functional programming" and procedural or OOPs. I think when you are dealing with data solutions, functional programming is a more natural way to think and work. Regards, Gourav On Sun, Sep 4, 2016 at 11:17 AM, AssafMendelson wrote:

Spark ML 2.1.0 new features

2016-09-05 Thread janardhan shetty
Is there any documentation or links on the new features which we can expect for Spark ML 2.1.0 release ?

Cassandra timestamp to spark Date field

2016-09-05 Thread Selvam Raman
Hi All, As per datastax report Cassandra to spark type timestamp Long, java.util.Date, java.sql.Date, org.joda.time.DateTime Please help me with your input. I have a Cassandra table with 30 fields. Out of it 3 are timestamp. I read cassandratable using sc.cassandraTable

Re: Spark SQL Tables on top of HBase Tables

2016-09-05 Thread Yan Zhou
There is a HSpark project, https://github.com/yzhou2001/HSpark, providing native and fast access to HBase. Currently it only supports Spark 1.4, but any suggestions and contributions are more than welcome. Try it out to find its speedups! On Sat, Sep 3, 2016 at 12:57 PM, Mich Talebzadeh

Re: Splitting columns from a text file

2016-09-05 Thread Somasundaram Sekar
sc.textFile("filename").map(_.split(",")).filter(arr => arr.length == 3 && arr(2).toDouble > 50).collect this will give you a Array[Array[String]] do as you may wish with it. And please read through abt RDD On 5 Sep 2016 8:51 pm, "Ashok Kumar" wrote: > Thanks everyone. > >

Re: Problem in accessing swebhdfs

2016-09-05 Thread Steve Loughran
Looks like it got a 404 back with a text/plain response, tried to parse that as JSON and made a mess of things. Updated the relevant (still open) JIRA with your stack trace. https://issues.apache.org/jira/browse/HDFS-6220 At a guess, the file it is looking for isn't there. Causes -the root

Re: S3A + EMR failure when writing Parquet?

2016-09-05 Thread Steve Loughran
On 4 Sep 2016, at 18:05, Everett Anderson > wrote: My impression from reading your various other replies on S3A is that it's also best to use mapreduce.fileoutputcommitter.algorithm.version=2 (which might someday be the

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Mich Talebzadeh
I suppose we are all looking at some sort of use case for recommendation engine, whether that is commodities or some Financial Instrument. They all depend on some form of criteria. The commodity or instrument does not matter. It could be a new "Super Mario Wii" release or some shares that are

Spark 2.0.0 Thrift Server problem with Hive metastore

2016-09-05 Thread Campagnola, Francesco
Hi, in an already working Spark - Hive environment with Spark 1.6 and Hive 1.2.1, with Hive metastore configured on Postgres DB, I have upgraded Spark to the 2.0.0. I have started the thrift server on YARN, then tried to execute from the beeline cli or a jdbc client the following command:

Re: Splitting columns from a text file

2016-09-05 Thread Ashok Kumar
Thanks everyone. I am not skilled like you gentlemen This is what I did 1) Read the text file val textFile = sc.textFile("/tmp/myfile.txt") 2) That produces an RDD of String. 3) Create a DF after splitting the file into an Array  val df = textFile.map(line =>

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Alonso Isidoro Roman
By the way, i would love to work in your project, looks promising! Alonso Isidoro Roman [image: https://]about.me/alonso.isidoro.roman 2016-09-05 16:57 GMT+02:00 Alonso Isidoro

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Alonso Isidoro Roman
Hi Mitch, 1. What do you mean "my own rating" here? You know the products. So what Amazon is going to provide by way of Kafka? The idea was to embed the functionality of a kafka producer within a rest service in order i can invoke this logic with my a rating. I did not create such

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Mich Talebzadeh
Thank you Alonso, I looked at your project. Interesting As I see it this is what you are suggesting 1. A kafka producer is going to ask periodically to Amazon in order to know what products based on my own ratings and i am going to introduced them into some kafka topic. 2. A spark

Re: Splitting columns from a text file

2016-09-05 Thread ayan guha
Then, You need to refer third term in the array, convert it to your desired data type and then use filter. On Tue, Sep 6, 2016 at 12:14 AM, Ashok Kumar wrote: > Hi, > I want to filter them for values. > > This is what is in array > >

Re: Splitting columns from a text file

2016-09-05 Thread Fridtjof Sander
Ask yourself how to access the third element in an array in Scala. Am 05.09.2016 um 16:14 schrieb Ashok Kumar: Hi, I want to filter them for values. This is what is in array 74,20160905-133143,98.11218069128827594148 I want to filter anything > 50.0 in the third column Thanks On

Re: Splitting columns from a text file

2016-09-05 Thread Ashok Kumar
Hi,I want to filter them for values. This is what is in array 74,20160905-133143,98.11218069128827594148 I want to filter anything > 50.0 in the third column Thanks On Monday, 5 September 2016, 15:07, ayan guha wrote: Hi x.split returns an array. So, after first

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Alonso Isidoro Roman
Hi Mitch, i wrote few months ago a tiny project with this issue in mind. The idea is to apply ALS algorithm in order to get some valid recommendations from another users. The url of the project Alonso Isidoro Roman [image:

Re: Splitting columns from a text file

2016-09-05 Thread ayan guha
Hi x.split returns an array. So, after first map, you will get RDD of arrays. What is your expected outcome of 2nd map? On Mon, Sep 5, 2016 at 11:30 PM, Ashok Kumar wrote: > Thank you sir. > > This is what I get > > scala> textFile.map(x=> x.split(",")) > res52:

Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Mich Talebzadeh
Hi, Has anyone done any work on Real time recommendation engines with Spark and Scala. I have seen few PPTs with Python but wanted to see if these have been done with Scala. I trust this question makes sense. Thanks p.s. My prime interest would be in Financial markets. Dr Mich Talebzadeh

Re: Splitting columns from a text file

2016-09-05 Thread Somasundaram Sekar
Please have a look at the documentation for information on how to work with RDD. Start with this http://spark.apache.org/docs/latest/quick-start.html On 5 Sep 2016 7:00 pm, "Ashok Kumar" wrote: > Thank you sir. > > This is what I get > > scala> textFile.map(x=>

Re: Splitting columns from a text file

2016-09-05 Thread Ashok Kumar
Thank you sir. This is what I get scala> textFile.map(x=> x.split(","))res52: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[27] at map at :27 How can I work on individual columns. I understand they are strings scala> textFile.map(x=> x.split(",")).map(x => (x.getString(0))     |

Re: Splitting columns from a text file

2016-09-05 Thread Somasundaram Sekar
Basic error, you get back an RDD on transformations like map. sc.textFile("filename").map(x => x.split(",") On 5 Sep 2016 6:19 pm, "Ashok Kumar" wrote: > Hi, > > I have a text file as below that I read in > > 74,20160905-133143,98.11218069128827594148 >

Splitting columns from a text file

2016-09-05 Thread Ashok Kumar
Hi, I have a text file as below that I read in

SPARK ML- Feature Selection Techniques

2016-09-05 Thread Bahubali Jain
Hi, Do we have any feature selection techniques implementation(wrapper methods,embedded methods) available in SPARK ML ? Thanks, Baahu -- Twitter:http://twitter.com/Baahu

Re: Why there is no top method in dataset api

2016-09-05 Thread Sean Owen
​No, ​ I'm not advising you to use .rdd, just saying it is possible. ​Although I'd only use RDDs if you had a good reason to, given Datasets now, they are not gone or even deprecated.​ You do not need to order the whole data set to get the top eleme ​nt. That isn't what top does though. You might

Re: Why there is no top method in dataset api

2016-09-05 Thread Jakub Dubovsky
Thanks Sean, I was under impression that spark creators are trying to persuade user community not to use RDD api directly. Spark summit I attended was full of this. So I am a bit surprised that I hear use-rdd-api as an advice from you. But if this is a way then I have a second question. For

Re[8]: Spark 2.0: SQL runs 5x times slower when adding 29th field to aggregation.

2016-09-05 Thread Сергей Романов
Hi, Gavin, Shuffling is exactly the same in both requests and is minimal. Both requests produces one shuffle task. Running time is the only difference I can see in metrics: timeit.timeit(spark.read.csv('file:///data/dump/test_csv', schema=schema).groupBy().sum(*(['dd_convs'] * 57) ).collect,

回复:[SparkSQL+SparkStreaming]SparkStreaming APP can not load data into SparkSQL table

2016-09-05 Thread luohui20001
the data can be written as parquet into HDFS. But the loading data process is not working as expected. ThanksBest regards! San.Luo - 原始邮件 - 发件人: 收件人:"user" 主题:[SparkSQL+SparkStreaming]SparkStreaming APP

[SparkSQL+SparkStreaming]SparkStreaming APP can not load data into SparkSQL table

2016-09-05 Thread luohui20001
hi guys: I got a question that my SparkStreaming APP can not loading data into SparkSQL table in. Here is my code: val conf = new SparkConf().setAppName("KafkaStreaming for " + topics).setMaster("spark://master60:7077") val storageLevel = StorageLevel.DISK_ONLY val ssc = new

Re: Unable to get raw probabilities after clearing model threshold

2016-09-05 Thread kundan kumar
Sorry, my bad. The issue got resolved. Thanks, Kundan On Mon, Sep 5, 2016 at 3:58 PM, kundan kumar wrote: > Hi, > > I am unable to get the raw probabilities despite of clearing the > threshold. Its still printing the predicted label. > > Can someone help resolve this

Unable to get raw probabilities after clearing model threshold

2016-09-05 Thread kundan kumar
Hi, I am unable to get the raw probabilities despite of clearing the threshold. Its still printing the predicted label. Can someone help resolve this issue. Here is the code snippet. LogisticRegressionWithSGD lrLearner = new LogisticRegressionWithSGD(); LogisticRegressionModel model =

Re: Is there anyway Spark UI is set to poll and refreshes itself

2016-09-05 Thread Mich Talebzadeh
Hi Sivakumaran Thanks for your very useful research. Apologies have been very busy. Let me read through and come back. Regards Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: How to detect when a JavaSparkContext gets stopped

2016-09-05 Thread Sean Owen
You can look into the SparkListener interface to get some of those messages. Losing the master though is pretty fatal to all apps. On Mon, Sep 5, 2016 at 7:30 AM, Hough, Stephen C wrote: > I have a long running application, configured to be HA, whereby only the >

How to detect when a JavaSparkContext gets stopped

2016-09-05 Thread Hough, Stephen C
I have a long running application, configured to be HA, whereby only the designated leader will acquire a JavaSparkContext, listen for requests and push jobs onto this context. The problem I have is, whenever my AWS instances running workers die (either a time to live expires or I cancel those