Re: Problem installing Sparck on Windows 8

2015-10-14 Thread Marco Mistroni
^ scala> sc res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@5fc7255c scala> On Tue, Oct 13, 2015 at 5:02 PM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 12 Oct 2015, at 23:11, Marco Mistroni <mmistr...@gmail.com> wrote: > > HI all &g

Re: Problem installing Sparck on Windows 8

2015-10-15 Thread Marco Mistroni
Steve Loughran" <ste...@hortonworks.com> wrote: > >> >> On 14 Oct 2015, at 20:56, Marco Mistroni <mmistr...@gmail.com> wrote: >> >> >> 15/10/14 20:52:35 WARN : Your hostname, MarcoLaptop resolves to a >> loopback/non-r >> eachable address: fe80:0:

Re: Problem installing Sparck on Windows 8

2015-10-17 Thread Marco Mistroni
, expecially if i dont understand why i am having exception doesn't spark like windows 8? any suggestions appreciated kind regards marco On Thu, Oct 15, 2015 at 11:40 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hi > i t ried to set this variable in my windows env variables

Problem installing Sparck on Windows 8

2015-10-12 Thread Marco Mistroni
HI all i have downloaded spark-1.5.1-bin-hadoop.2.4 i have extracted it on my machine, but when i go to the \bin directory and invoke spark-shell i get the following exception Could anyone assist pls? I followed instructions in ebook Learning Spark, but mayb the instructions are old? kr marco

aggregateByKey vs combineByKey

2016-01-05 Thread Marco Mistroni
Hi all i have the following dataSet kv = [(2,Hi), (1,i), (2,am), (1,a), (4,test), (6,s tring)] It's a simple list of tuples containing (word_length, word) What i wanted to do was to group the result by key in order to have a result in the form [(word_length_1, [word1, word2, word3],

Re: Hive error when starting up spark-shell in 1.5.2

2015-12-24 Thread Marco Mistroni
on HDFS should be writable. Current permissions are: rwx---rwx I will have to play around with windows permissions to allow spark to use that directory kr marco On Sun, Dec 20, 2015 at 5:15 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Thanks Chris will give it a go and re

Hive error when starting up spark-shell in 1.5.2

2015-12-19 Thread Marco Mistroni
HI all posting again this as i was experiencing this error also under 1.5.1 I am running spark 1.5.2 on a Windows 10 laptop (upgraded from Windows 8) When i launch spark-shell i am getting this exception, presumably becaus ei hav eno admin right to /tmp directory on my latpop (windows 8-10 seems

Re: Hive error when starting up spark-shell in 1.5.2

2015-12-20 Thread Marco Mistroni
(or should be, anyway). I believe you can change the root path > thru this mechanism. > > if not, this should give you more info google on. > > let me know as this comes up a fair amount. > > > On Dec 19, 2015, at 4:58 PM, Marco Mistroni <mmistr...@gmail.com>

Re: Fw: Basic question on using one's own classes in the Scala app

2016-06-06 Thread Marco Mistroni
HI Ashok this is not really a spark-related question so i would not use this mailing list. Anyway, my 2 cents here as outlined by earlier replies, if the class you are referencing is in a different jar, at compile time you will need to add that dependency to your build.sbt, I'd personally

Re: Spark_Usecase

2016-06-07 Thread Marco Mistroni
Hi how about 1. have a process that read the data from your sqlserver and dumps it as a file into a directory on your hd 2. use spark-streanming to read data from that directory and store it into hdfs perhaps there is some sort of spark 'connectors' that allows you to read data from a db

Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread Marco Mistroni
HI have you tried to add this flag? -Djsse.enableSNIExtension=false i had similar issues in another standalone application when i switched to java8 from java7 hth marco On Mon, Jun 6, 2016 at 9:58 PM, Koert Kuipers wrote: > mhh i would not be very happy if the implication

Pls assist: Spark DecisionTree question

2016-06-10 Thread Marco Mistroni
HI all i am trying to run a ML program against some data, using DecisionTrees. To fine tune the parameters, i am running this loop to find the optimal values for impurity, depth and bins for (impurity <- Array("gini", "entropy"); depth<- Array(1,2,3, 4, 5); bins <-

Accuracy of BinaryClassificationMetrics

2016-06-11 Thread Marco Mistroni
HI all which method shall i use to verify the accuracy of a BinaryClassificationMetrics ? the multiClassMetrics has a precision() method but that is missing on the BinaryClassificationMetrics thanks marco

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Marco Mistroni
too little info it'll help if you can post the exception and show your sbt file (if you are using sbt), and provide minimal details on what you are doing kr On Fri, Jun 17, 2016 at 10:08 AM, VG wrote: > Failed to find data source: com.databricks.spark.xml > > Any suggestions

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Marco Mistroni
l") > .option("rowTag", "row") > .load("A.xml"); > > Any suggestions please .. > > > > > On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com> > wrote: > >> too little in

Re: Python to Scala

2016-06-18 Thread Marco Mistroni
Hi Post the code. I code in python and Scala on spark..I can give u help though api for Scala and python are practically sameonly difference is in the python lambda vs Scala inline functions Hth On 18 Jun 2016 6:27 am, "Aakash Basu" wrote: > I don't have a sound

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Marco Mistroni
riba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>> Caused by:* java.lang.ClassNotFoundException: >>>> scala.collection.GenTraversableOnce$class* >>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>> at java.lang.ClassLoader.loadClass(Clas

PLease help: installation of spark 1.6.0 on ubuntu fails

2016-02-25 Thread Marco Mistroni
Hello all could anyone help? i have tried to install spark 1.6.0 on ubuntu, but the installation failed Here are my steps 1. download spark (successful) 31 wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0.tgz 33 tar -zxf spark-1.6.0.tgz 2. cd spark-1.6.0 2.1 sbt assembly error]

Re: Can spark somehow help with this usecase?

2016-04-05 Thread Marco Mistroni
s FTP Client > > I assume that each ftp get is independent. *Maybe some one know more > about how to control the amount of concurrency*. I think it will be based > on the number of partitions, works, and cores? > > Andy > > From: Marco Mistroni <mmistr...@gmail.com> > Date:

Can spark somehow help with this usecase?

2016-04-05 Thread Marco Mistroni
Hi I m currently using spark to process a file containing a million of rows(edgar quarterly filings files) Each row contains some infos plus a location of a remote file which I need to retrieve using FTP and then process it's content. I want to do all 3 operations ( process filing file, fetch

Please assist: Spark 1.5.2 / cannot find StateSpec / State

2016-04-13 Thread Marco Mistroni
hi all i am trying to replicate the Streaming Wordcount example described here https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala in my build,sbt i have the following dependencies . libraryDependencies +=

Re: Re:[spark] build/sbt gen-idea error

2016-04-12 Thread Marco Mistroni
Have you tried SBT eclipse plugin? Then u can run SBT eclipse and have ur spark project directly in eclipse Pls Google it and u shud b able to find ur way. If not ping me and I send u the plugin (I m replying from my phone) Hth On 12 Apr 2016 4:53 pm, "ImMr.K" <875061...@qq.com> wrote: But how to

Re: Fwd: Spark 2.0 Shell -csv package weirdness

2016-03-19 Thread Marco Mistroni
Have u tried df.saveAsParquetFIle? I think that method is on df Api Hth Marco On 19 Mar 2016 7:18 pm, "Vincent Ohprecio" wrote: > > For some reason writing data from Spark shell to csv using the `csv > package` takes almost an hour to dump to disk. Am I going crazy or did I

Re: Spark 2.0 Shell -csv package weirdness

2016-03-20 Thread Marco Mistroni
Hi I try tomorrow same settings as you to see if I can experience same issues. Will report back once done Thanks On 20 Mar 2016 3:50 pm, "Vincent Ohprecio" wrote: > Thanks Mich and Marco for your help. I have created a ticket to look into > it on dev channel. > Here is the

Re: All inclusive uber-jar

2016-04-04 Thread Marco Mistroni
Hi U can use SBT assembly to create uber jar. U should set spark libraries as 'provided' in ur SBT Hth Marco Ps apologies if by any chances I m telling u something u already know On 4 Apr 2016 2:36 pm, "Mich Talebzadeh" wrote: > Hi, > > > When one builds a project for

Re: removing header from csv file

2016-04-27 Thread Marco Mistroni
If u r using Scala api you can do Myrdd.zipwithindex.filter(_._2 >0).map(_._1) Maybe a little bit complicated but will do the trick As per spark CSV, you will get back a data frame which you can reconduct to rdd. . Hth Marco On 27 Apr 2016 6:59 am, "nihed mbarek" wrote: > You

Re: n

2016-04-27 Thread Marco Mistroni
Hi please share your build.sbt here's mine for reference (using Spark 1.6.1 + scala 2.10) (pls ignore extra stuff i have added for assembly and logging) // Set the project name to the string 'My Project' name := "SparkExamples" // The := method used in Name and Version is one of two

Addign a new column to a dataframe (based on value of existing column)

2016-04-28 Thread Marco Mistroni
HI all i have a dataFrame with a column ("Age", type double) and i am trying to create a new column based on the value of the Age column, using Scala API this code keeps on complaining scala> df.withColumn("AgeInt", if (df("Age") > 29.0) lit(1) else lit(0)) :28: error: type mismatch; found :

Pls Assist: error when creating cluster on AWS using spark's ec2 scripts

2016-05-17 Thread Marco Mistroni
Hi was wondering if anyone can assist here.. I am trying to create a spark cluster on AWS using scripts located in spark-1.6.1/ec2 directory When the spark_ec2.py scripts tries to do a rsync to copy directories over to teh AWS master node it fails miserably with this stack trace DEBUG:spark ecd

Issue with creation of EC2 cluster using spark scripts

2016-05-16 Thread Marco Mistroni
hi all i am experiencing issues when creating ec2 clusters using scripts in hte spark\ec2 directory i launched the following command ./spark-ec2 -k sparkkey -i sparkAccessKey.pem -r us-west2 -s 4 launch MM-Cluster My output is stuck with the following (has been for the last 20 minutes) i

Re: Pls assist: which conf file do i need to modify if i want spark-shell to inclucde external packages?

2016-04-21 Thread Marco Mistroni
iew?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 21 April 2016 at 15:13, Marco Mistroni <mmistr...@gmail.com> wrote: > >> HI all >> i need to use spark-csv in my spark instance, and i want to avoid >> launching s

Pls assist: which conf file do i need to modify if i want spark-shell to inclucde external packages?

2016-04-21 Thread Marco Mistroni
HI all i need to use spark-csv in my spark instance, and i want to avoid launching spark-shell by passing the package name every time I seem to remember that i need to amend a file in the /conf directory to inlcude e,g spark.packages com.databricks:spark-csv_2.11:1.4.0 but i cannot find

Re: Addign a new column to a dataframe (based on value of existing column)

2016-04-28 Thread Marco Mistroni
Column("AgeInt", when(col("age") > 29.0, > 1).otherwise(0)).show > +++--+ > | age|name|AgeInt| > +++--+ > |25.0| foo| 0| > |30.0| bar| 1| > +++--+ > > On Thu, 28 Apr 2016 at 20:45 Marco Mistroni <mmistr...@gmail.com> wrote:

Re: Maintaining order of pair rdd

2016-07-26 Thread Marco Mistroni
ardhan shetty <janardhan...@gmail.com> wrote: > groupBy is a shuffle operation and index is already lost in this process > if I am not wrong and don't see *sortWith* operation on RDD. > > Any suggestions or help ? > > On Mon, Jul 25, 2016 at 12:58 AM, Marco Mistroni <mmis

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Marco Mistroni
Hi Kevin you should not need to rebuild everything. Instead, i believe you should launch spark-submit by specifying the kafka jar file in your --packages... i had to follow same when integrating spark streaming with flume have you checked this link ?

Re: Dataset , RDD zipWithIndex -- How to use as a map .

2016-07-22 Thread Marco Mistroni
Hi So u u have a data frame, then use zipwindex and create a tuple I m not sure if df API has something useful for zip w index. But u can - get a data frame - convert it to rdd (there's a tordd ) - do a zip with index That will give u a rdd with 3 fields... I don't think you can update df

Re: MLlib, Java, and DataFrame

2016-07-22 Thread Marco Mistroni
le.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html > > plz help me.. I couldn't find any solution..plz > > On Fri, Jul 22, 2016 at 5:50 PM, Jean Georges Perrin <j...@jgp.net> wrote: > >> Thanks Marco - I like the idea of sticking with DataFrames ;)

Re: MLlib, Java, and DataFrame

2016-07-22 Thread Marco Mistroni
..plz > > On Fri, Jul 22, 2016 at 5:50 PM, Jean Georges Perrin <j...@jgp.net> wrote: > >> Thanks Marco - I like the idea of sticking with DataFrames ;) >> >> >> On Jul 22, 2016, at 7:07 AM, Marco Mistroni <mmistr...@gmail.com> wrote: >> >&g

Re: Maintaining order of pair rdd

2016-07-25 Thread Marco Mistroni
54477, ...*)) > ) > > As you can see after *groupbyKey* operation is complete item 18519 is in > index 0 for ID1, index 2 for ID3 and index 16 for ID2 where as expected is > index 0 > > > On Sun, Jul 24, 2016 at 12:43 PM, Marco Mistroni <mmistr...@gmail.com> > w

Pls assist: Creating Spak EC2 cluster using spark_ec2.py script and a custom AMI

2016-07-25 Thread Marco Mistroni
HI all i was wondering if anyone can help with this I Have created a spark cluster before using spark_ec2.py script from Spark 1.6.1 that by default uses a very old AMI... so i decided to try to launch the script with a more up to date AMI. the one i have used is ami-d732f0b7, which refers to

Pls assist: need to create an udf that returns a LabeledPoint in pyspark

2016-07-28 Thread Marco Mistroni
hi all could anyone assist? i need to create a udf function that returns a LabeledPoint I read that in pyspark (1.6) LabeledPoint is not supported and i have to create a StructType anyone can point me in some directions? kr marco

Re: Maintaining order of pair rdd

2016-07-24 Thread Marco Mistroni
Apologies I misinterpreted could you post two use cases? Kr On 24 Jul 2016 3:41 pm, "janardhan shetty" <janardhan...@gmail.com> wrote: > Marco, > > Thanks for the response. It is indexed order and not ascending or > descending order. > On Jul 24, 2016 7:

Re: UDF to build a Vector?

2016-07-24 Thread Marco Mistroni
Hi what is your source data? i am guessing a DataFrame or Integers as you are usingan UDF So your DataFrame is then a bunch of Row[Integer] ? below a sample from one of my code to predict eurocup winners , going from a DataFrame of Row[Double] to a RDD of LabeledPoint I m not using UDF to

Re: Maintaining order of pair rdd

2016-07-24 Thread Marco Mistroni
ts of ID1 with first five element of ID3 > next first 5 elements of ID1 to ID2. Similarly next 5 elements in that > order until the end of number of elements. > Let me know if this helps > > > On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mmistr...@gmail.com> > wrote: >

Re: How to generate a sequential key in rdd across executors

2016-07-24 Thread Marco Mistroni
Hi how bout creating an auto increment column in hbase? Hth On 24 Jul 2016 3:53 am, "yeshwanth kumar" wrote: > Hi, > > i am doing bulk load to hbase using spark, > in which i need to generate a sequential key for each record, > the key should be sequential across all the

Re: Java Recipes for Spark

2016-08-01 Thread Marco Mistroni
Hi jg +1 for link. I'd add ML and graph examples if u can -1 for programmign language choice :)) kr On 31 Jul 2016 9:13 pm, "Jean Georges Perrin" wrote: > Thanks Guys - I really appreciate :)... If you have any idea of something > missing, I'll gladly add it. > > (and

Re: Spark2 SBT Assembly

2016-08-10 Thread Marco Mistroni
How bout all dependencies? Presumably they will all go in --jars ? What if I have 10 dependencies? Any best practices in packaging apps for spark 2.0? Kr On 10 Aug 2016 6:46 pm, "Nick Pentreath" wrote: > You're correct - Spark packaging has been shifted to not use the

Re: XLConnect in SparkR

2016-07-21 Thread Marco Mistroni
Hi, have you tried to use spark-csv (https://github.com/databricks/spark-csv) ? after all you can reconduct an XL file to CSV hth. On Thu, Jul 21, 2016 at 4:25 AM, Felix Cheung wrote: > From looking at be CLConnect package, its loadWorkbook() function only >

Re: Presentation in London: Running Spark on Hive or Hive on Spark

2016-07-15 Thread Marco Mistroni
Dr Mich do you have any slides or videos available for the presentation you did @Canary Wharf? kindest regards marco On Wed, Jul 6, 2016 at 10:37 PM, Mich Talebzadeh wrote: > Dear forum members > > I will be presenting on the topic of "Running Spark on Hive or Hive

Re: RandomForestClassifier

2016-07-20 Thread Marco Mistroni
Hi afaik yes (other pls override ). Generally, in RandomForest and DecisionTree you have a column which you are trying to 'predict' (the label) and a set of features that are used to predict the outcome. i would assume that if you specify thelabel column and the 'features' columns, everything

Re: Building standalone spark application via sbt

2016-07-20 Thread Marco Mistroni
; > However I have fixed this by making a fat jar using sbt assembly plugin. > > Now all the dependencies are included in that jar and I use that jar in > spark-submit > > Thanks > Sachin > > > On Wed, Jul 20, 2016 at 9:42 PM, Marco Mistroni <mmistr...@g

Re: Building standalone spark application via sbt

2016-07-20 Thread Marco Mistroni
Hello Sachin pls paste the NoClassDefFound Exception so we can see what's failing, aslo please advise how are you running your Spark App For an extremely simple case, let's assume you have your MyFirstSparkApp packaged in your myFirstSparkApp.jar Then all you need to do would be to kick off

Re: spark classloader question

2016-07-07 Thread Marco Mistroni
Hi Chen pls post 1 . snippet code 2. exception any particular reason why you need to load classes in other jars programmatically? Have you tried to build a fat jar with all the dependencies ? hth marco On Thu, Jul 7, 2016 at 5:05 PM, Chen Song wrote: > Sorry to spam

Re: Error in collecting RDD as a Map - IOException in collectAsMap

2016-07-23 Thread Marco Mistroni
Hi vg I believe the error msg is misleading. I had a similar one with pyspark yesterday after calling a count on a data frame, where the real error was with an incorrect user defined function being applied . Pls send me some sample code with a trimmed down version of the data and I see if i can

Re: MLlib, Java, and DataFrame

2016-07-22 Thread Marco Mistroni
Hello Jean you can take ur current DataFrame and send them to mllib (i was doing that coz i dindt know the ml package),but the process is littlebit cumbersome 1. go from DataFrame to Rdd of Rdd of [LabeledVectorPoint] 2. run your ML model i'd suggest you stick to DataFrame + ml package :) hth

Re: Suprised!!!!!Spark-shell showing inconsistent results

2017-02-02 Thread Marco Mistroni
Hi Have u tried to sort the results before comparing? On 2 Feb 2017 10:03 am, "Alex" wrote: > Hi As shown below same query when ran back to back showing inconsistent > results.. > > testtable1 is Avro Serde table... > > [image: Inline image 1] > > > > hc.sql("select *

Re: Hive Java UDF running on spark-sql issue

2017-02-01 Thread Marco Mistroni
Hi What is the UDF supposed to do? Are you trying to write a generic function to convert values to another type depending on what is the type of the original value? Kr On 1 Feb 2017 5:56 am, "Alex" wrote: Hi , we have Java Hive UDFS which are working perfectly fine in

Re: Running a spark code on multiple machines using google cloud platform

2017-02-02 Thread Marco Mistroni
U can use EMR if u want to run. On a cluster Kr On 2 Feb 2017 12:30 pm, "Anahita Talebi" wrote: > Dear all, > > I am trying to run a spark code on multiple machines using submit job in > google cloud platform. > As the inputs of my code, I have a training and

Re: SSpark streaming: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$

2017-02-06 Thread Marco Mistroni
he spark connectors > have the appropriate transitive dependency on the correct version. > > On Sat, Feb 4, 2017 at 3:25 PM, Marco Mistroni <mmistr...@gmail.com> > wrote: > > Hi > > not sure if this will help at all, and pls take it with a pinch of salt >

Re: SSpark streaming: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$

2017-02-04 Thread Marco Mistroni
Hi not sure if this will help at all, and pls take it with a pinch of salt as i dont have your setup and i am not running on a cluster I have tried to run a kafka example which was originally workkign on spark 1.6.1 on spark 2. These are the jars i am using

Kafka dependencies in Eclipse project /Pls assist

2017-01-31 Thread Marco Mistroni
HI all i am trying to run a sample spark code which reads streaming data from Kafka I Have followed instructions here https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html Here's my setup Spark: 2.0.1 Kafka:0.10.1.1 Scala Version: 2.11 Libraries used -

converting timestamp column to a java.util.Date

2017-01-23 Thread Marco Mistroni
HI all i am trying to convert a string column, in a Dataframe , to a java.util.Date but i am getting this exception [dispatcher-event-loop-0] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_0_piece0 on 169.254.2.140:53468 in memory (size: 14.3 KB, free: 767.4 MB) Exception

Re: care to share latest pom forspark scala applications eclipse?

2017-02-24 Thread Marco Mistroni
Hi i am using sbt to generate ecliipse project file these are my dependencies they 'll probably translate to some thing like this in mvn dependencies these are same for all packages listed below org.apache,spark 2.1.0 spark-core_2.11 spark-streaming_2.11spark-mllib_2.11 spark-sql_2.11

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Marco Mistroni
Try to use --packages to include the jars. From error it seems it's looking for main class in jars but u r running a python script... On 25 Feb 2017 10:36 pm, "Raymond Xie" wrote: That's right Anahita, however, the class name is not indicated in the original github

Re: Importing a github project on sbt

2017-01-16 Thread Marco Mistroni
UhmNot a SPK issueAnyway...Had similar issues with sbt The quick sol. To get u going is to place ur dependency in your lib folder The notsoquick is to build the sbt dependency and do a sbt publish-local, or deploy local But I consider both approaches hacks. Hth On 16 Jan 2017 2:00

Re: Spark vs MongoDB: saving DataFrame to db raises missing database name exception

2017-01-16 Thread Marco Mistroni
in mongo url. > > I remember I tested with python successfully. > > Best Regards, > Palash > > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Tue, 17 Jan, 2017 at 5:37 am, Marco Mistroni > <mmistr...@gmail

Spark vs MongoDB: saving DataFrame to db raises missing database name exception

2017-01-16 Thread Marco Mistroni
hi all i have the folllowign snippet which loads a dataframe from a csv file and tries to save it to mongodb. For some reason, the MongoSpark.save method raises the following exception Exception in thread "main" java.lang.IllegalArgumentException: Missing database name. Set via the

Spark 2.0 vs MongoDb /Cannot find dependency using sbt

2017-01-16 Thread Marco Mistroni
HI all in searching on how to use Spark 2.0 with mongo i came across this link https://jira.mongodb.org/browse/SPARK-20 i amended my build.sbt (content below), however the mongodb dependency was not found Could anyone assist? kr marco name := "SparkExamples" version := "1.0" scalaVersion :=

Re: Spark 2.0 vs MongoDb /Cannot find dependency using sbt

2017-01-16 Thread Marco Mistroni
sorry. should have done more research before jumping to the list the version of the connector is 2.0.0, available from maven repors sorry On Mon, Jan 16, 2017 at 9:32 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > HI all > in searching on how to use Spark 2.0 with mongo i

Re: Spark vs MongoDB: saving DataFrame to db raises missing database name exception

2017-01-18 Thread Marco Mistroni
t.uri", "mongodb://localhost:27017/test.tree")) kr marco On Tue, Jan 17, 2017 at 7:53 AM, Marco Mistroni <mmistr...@gmail.com> wrote: > Uh. Many thanksWill try it out > > On 17 Jan 2017 6:47 am, "Palash Gupta" <spline_pal...@yahoo.com> wrote: >

Re: Running Spark on EMR

2017-01-15 Thread Marco Mistroni
ng Spark in standalone mode. > > Regards > > > ---- Original message > From: Marco Mistroni > Date:15/01/2017 16:34 (GMT+02:00) > To: User > Subject: Running Spark on EMR > > hi all > could anyone assist here? > i am trying to run spark 2.0.0 on an EMR c

Re: Run spark machine learning example on Yarn failed

2017-02-28 Thread Marco Mistroni
Or place the file in s3 and provide the s3 path Kr On 28 Feb 2017 1:18 am, "Yunjie Ji" wrote: > After start the dfs, yarn and spark, I run these code under the root > directory of spark on my master host: > `MASTER=yarn ./bin/run-example ml.LogisticRegressionExample >

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-26 Thread Marco Mistroni
similar setup can be used on Linux) https://spark.apache.org/docs/latest/streaming-kafka-integration.html kr On Sat, Feb 25, 2017 at 11:12 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hi I have a look. At GitHub project tomorrow and let u know. U have a py > scripts to run and

Re: error in kafka producer

2017-02-28 Thread Marco Mistroni
This exception coming from a Spark program? could you share few lines of code ? kr marco On Tue, Feb 28, 2017 at 10:23 PM, shyla deshpande wrote: > producer send callback exception: > org.apache.kafka.common.errors.TimeoutException: > Expiring 1 record(s) for

Re: question on transforms for spark 2.0 dataset

2017-03-01 Thread Marco Mistroni
Hi I think u need an UDF if u want to transform a column Hth On 1 Mar 2017 4:22 pm, "Bill Schwanitz" wrote: > Hi all, > > I'm fairly new to spark and scala so bear with me. > > I'm working with a dataset containing a set of column / fields. The data > is stored in hdfs as

Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-09-03 Thread Marco Mistroni
hi all i am getting failures when building spark 2.0 on Ubuntu 16.06 Here's details of what i have installed on the ubuntu host - java 8 - scala 2.11 - git When i launch the command ./build/mvn -Pyarn -Phadoop-2.7 -DskipTests clean package everything compiles sort of fine and at the end i

Re: Help with Jupyter Notebook Settup on CDH using Anaconda

2016-09-03 Thread Marco Mistroni
Hi please paste the exception for Spark vs Jupyter, you might want to sign up for this. It'll give you jupyter and spark...and presumably the spark-csv is already part of it ? https://community.cloud.databricks.com/login.html hth marco On Sat, Sep 3, 2016 at 8:10 PM, Arif,Mubaraka

Re: year out of range

2016-09-08 Thread Marco Mistroni
Pls paste code and sample CSV I m guessing it has to do with formatting time? Kr On 8 Sep 2016 12:38 am, "Daniel Lopes" wrote: > Hi, > > I'm* importing a few CSV*s with spark-csv package, > Always when I give a select at each one looks ok > But when i join then with

Re: year out of range

2016-09-08 Thread Marco Mistroni
--+--------++- > ---+--+++--+ > -+--+ > > - > > *Daniel Lopes* > Chief Data and Analytics Officer | OneMatch > c: +55 (18) 99764-2733 | https

Re: Error while calling udf Spark submit

2016-09-08 Thread Marco Mistroni
Not enough info. But u can try same code in spark shell and get hold of the exception Hth On 8 Sep 2016 11:16 am, "Divya Gehlot" wrote: > Hi, > I am on Spark 1.6.1 > I am getting below error when I am trying to call UDF in my spark > Dataframe column > UDF > /* get the

Please assist: migrating RandomForestExample from MLLib to ML

2016-09-14 Thread Marco Mistroni
hi all i have been toying around with this well known RandomForestExample code val forest = RandomForest.trainClassifier( trainData, 7, Map(10 -> 4, 11 -> 40), 20, "auto", "entropy", 30, 300) This comes from this link (

Re: Please assist: migrating RandomForestExample from MLLib to ML

2016-09-14 Thread Marco Mistroni
it? > > Sean > > > On Wed, Sep 14, 2016 at 10:18 PM, Marco Mistroni <mmistr...@gmail.com> > wrote: > > hi all > > i have been toying around with this well known RandomForestExample code > > > > val forest = RandomForest.trainClassifier( > > trainD

Re: spark-submit failing but job running from scala ide

2016-09-26 Thread Marco Mistroni
Hi Vr your code works fine for me, running on Windows 10 vs Spark 1.6.1 i m guessing your Spark installation could be busted? That would explain why it works on your IDE, as you are just importing jars in your project. The java.io.IOException: Failed to connect to error is misleading, i have

Re: building Spark 2.1 vs Java 1.8 on Ubuntu 16/06

2016-10-06 Thread Marco Mistroni
ns from Java 7 to Java 8 is to use > the scripts build/mvn and build/sbt, which should be updated on a regular > basis with safe JVM options. > > Fred > > On Wed, Oct 5, 2016 at 1:40 AM, Marco Mistroni <mmistr...@gmail.com> > wrote: > >> Thanks Richard. It a

Re: Please assist: Building Docker image containing spark 2.0

2016-08-27 Thread Marco Mistroni
gt; ./dev/change-scala-version.sh 2.10 ./build/mvn -Pyarn -Phadoop-2.4 >>> -Dscala-2.10 -DskipTests clean package >>> If you're building with scala 2.10 >>> >>> On Sat, Aug 27, 2016, 00:18 Marco Mistroni <mmistr...@gmail.com> wrote: >>> >>>

Re: Please assist: Building Docker image containing spark 2.0

2016-08-27 Thread Marco Mistroni
all good. Tal's suggestion did it. i shud have read the manual first :( tx for assistance On Sat, Aug 27, 2016 at 9:06 AM, Marco Mistroni <mmistr...@gmail.com> wrote: > Thanks, i'll follow advice and try again > > kr > marco > > On Sat, Aug 27, 2016 at

Re: Please assist: Building Docker image containing spark 2.0

2016-08-26 Thread Marco Mistroni
Aug 26, 2016 at 6:18 PM, Michael Gummelt <mgumm...@mesosphere.io> wrote: > :) > > On Thu, Aug 25, 2016 at 2:29 PM, Marco Mistroni <mmistr...@gmail.com> > wrote: > >> No i wont accept that :) >> I can't believe i have wasted 3 hrs for a space! >> >&

Re: Converting DataFrame's int column to Double

2016-08-25 Thread Marco Mistroni
many tx Jestin! On Thu, Aug 25, 2016 at 10:13 PM, Jestin Ma <jestinwith.a...@gmail.com> wrote: > How about this: > > df.withColumn("doubles", col("ints").cast("double")).drop("ints") > > On Thu, Aug 25, 2016 at 2:09 PM, Marco Mi

Re: Please assist: Building Docker image containing spark 2.0

2016-08-25 Thread Marco Mistroni
No i wont accept that :) I can't believe i have wasted 3 hrs for a space! Many thanks MIchael! kr On Thu, Aug 25, 2016 at 10:01 PM, Michael Gummelt <mgumm...@mesosphere.io> wrote: > You have a space between "build" and "mvn" > > On Thu, Aug 25, 2016

Converting DataFrame's int column to Double

2016-08-25 Thread Marco Mistroni
hi all i might be stuck in old code, but this is what i am doing to convert a DF int column to Double val intToDoubleFunc:(Int => Double) = lbl => lbl.toDouble val labelToDblFunc = udf(intToDoubleFunc) val convertedDF = df.withColumn("SurvivedDbl", labelToDblFunc(col("Survived"))) is there a

Please assist: Building Docker image containing spark 2.0

2016-08-25 Thread Marco Mistroni
HI all sorry for the partially off-topic, i hope there's someone on the list who has tried the same and encountered similar issuse Ok so i have created a Docker file to build an ubuntu container which inlcudes spark 2.0, but somehow when it gets to the point where it has to kick off ./build/mvn

Re: Treadting NaN fields in Spark

2016-09-28 Thread Marco Mistroni
Hi Dr Mich, how bout reading all csv as string and then applying an UDF sort of like this? import scala.util.control.Exception.allCatch def getDouble(doubleStr:String):Double = allCatch opt doubleStr.toDouble match { case Some(doubleNum) => doubleNum case _ => Double.NaN }

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-10-02 Thread Marco Mistroni
t; Try shutting down zinc. Something's funny about your compile server. > It's not required anyway. > > On Sat, Oct 1, 2016 at 3:24 PM, Marco Mistroni <mmistr...@gmail.com> > wrote: > > Hi guys > > sorry to annoy you on this but i am getting nowhere. So far i

Re: How to use Spark-Scala to download a CSV file from the web?

2016-09-25 Thread Marco Mistroni
Hi not sure if spark-csv supports the http:// format you use to load data from the WEB. I just tried this and got exception scala> val df = sqlContext.read. | format("com.databricks.spark.csv"). | option("inferSchema", "true"). |

Re: In Spark-Scala, how to copy Array of Lists into new DataFrame?

2016-09-25 Thread Marco Mistroni
Hi i must admit , i had issues as well in finding a sample that does that, (hopefully Spark folks can add more examples or someone on the list can post a sample code?) hopefully you can reuse sample below So, you start from an rdd of doubles (myRdd) ## make a row val toRddOfRows =

Re: In Spark-Scala, how to copy Array of Lists into new DataFrame?

2016-09-25 Thread Marco Mistroni
Hi in fact i have just found some written notes in my code see if this docs help you (it will work with any spark versions, not only 1.3.0) https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#creating-dataframes hth On Sun, Sep 25, 2016 at 1:25 PM, Marco Mistroni <mmi

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-10-01 Thread Marco Mistroni
ark git) that i am using somehow wrong parameters. or perhaps i should install scala 2.11 before i install spark? or Maven ? kr marco On Fri, Sep 30, 2016 at 8:23 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hi all > this problem is still bothering me. > Here's my setu

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-09-30 Thread Marco Mistroni
) at scala_maven.ScalaTestCompileMojo.execute(ScalaTestCompileMojo.java:48) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) anyone found a similar error? kr On Sat, Sep 3, 2016 at 2:54 PM, Marco Mistroni <mmistr...@gmail.com>

Re: createDataFrame causing a strange error.

2016-11-27 Thread Marco Mistroni
Hi pickle erros normally point to serialisation issue. i am suspecting something wrong with ur S3 data , but is just a wild guess... Is your s3 object publicly available? few suggestions to nail down the problem 1 - try to see if you can read your object from s3 using boto3 library 'offline',

Re: createDataFrame causing a strange error.

2016-11-28 Thread Marco Mistroni
ppend(object.key) > > print("object key") > print (s3_list[0]) > > s3obj = boto3.resource('s3').Object(bucket_name='time-waits-for-no-man', > key=s3_list[0]) > contents = s3obj.get()['Body'].read().decode() > meow = contents.splitlines() > result_wo_timestamp = map(ujson.l

  1   2   >