^
scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@5fc7255c
scala>
On Tue, Oct 13, 2015 at 5:02 PM, Steve Loughran <ste...@hortonworks.com>
wrote:
>
> On 12 Oct 2015, at 23:11, Marco Mistroni <mmistr...@gmail.com> wrote:
>
> HI all
&g
Steve Loughran" <ste...@hortonworks.com> wrote:
>
>>
>> On 14 Oct 2015, at 20:56, Marco Mistroni <mmistr...@gmail.com> wrote:
>>
>>
>> 15/10/14 20:52:35 WARN : Your hostname, MarcoLaptop resolves to a
>> loopback/non-r
>> eachable address: fe80:0:
, expecially if i dont understand why i am having
exception
doesn't spark like windows 8?
any suggestions appreciated
kind regards
marco
On Thu, Oct 15, 2015 at 11:40 PM, Marco Mistroni <mmistr...@gmail.com>
wrote:
> Hi
> i t ried to set this variable in my windows env variables
HI all
i have downloaded spark-1.5.1-bin-hadoop.2.4
i have extracted it on my machine, but when i go to the \bin directory and
invoke
spark-shell i get the following exception
Could anyone assist pls?
I followed instructions in ebook Learning Spark, but mayb the instructions
are old?
kr
marco
Hi all
i have the following dataSet
kv = [(2,Hi), (1,i), (2,am), (1,a), (4,test), (6,s tring)]
It's a simple list of tuples containing (word_length, word)
What i wanted to do was to group the result by key in order to have a
result in the form
[(word_length_1, [word1, word2, word3],
on HDFS should be writable. Current permissions are:
rwx---rwx
I will have to play around with windows permissions to allow spark to use
that directory
kr
marco
On Sun, Dec 20, 2015 at 5:15 PM, Marco Mistroni <mmistr...@gmail.com> wrote:
> Thanks Chris will give it a go and re
HI all
posting again this as i was experiencing this error also under 1.5.1
I am running spark 1.5.2 on a Windows 10 laptop (upgraded from Windows 8)
When i launch spark-shell i am getting this exception, presumably becaus ei
hav eno
admin right to /tmp directory on my latpop (windows 8-10 seems
(or should be, anyway). I believe you can change the root path
> thru this mechanism.
>
> if not, this should give you more info google on.
>
> let me know as this comes up a fair amount.
>
> > On Dec 19, 2015, at 4:58 PM, Marco Mistroni <mmistr...@gmail.com>
HI Ashok
this is not really a spark-related question so i would not use this
mailing list.
Anyway, my 2 cents here
as outlined by earlier replies, if the class you are referencing is in a
different jar, at compile time you will need to add that dependency to your
build.sbt,
I'd personally
Hi
how about
1. have a process that read the data from your sqlserver and dumps it as a
file into a directory on your hd
2. use spark-streanming to read data from that directory and store it into
hdfs
perhaps there is some sort of spark 'connectors' that allows you to read
data from a db
HI
have you tried to add this flag?
-Djsse.enableSNIExtension=false
i had similar issues in another standalone application when i switched to
java8 from java7
hth
marco
On Mon, Jun 6, 2016 at 9:58 PM, Koert Kuipers wrote:
> mhh i would not be very happy if the implication
HI all
i am trying to run a ML program against some data, using DecisionTrees.
To fine tune the parameters, i am running this loop to find the optimal
values for
impurity, depth and bins
for (impurity <- Array("gini", "entropy");
depth<- Array(1,2,3, 4, 5);
bins <-
HI all
which method shall i use to verify the accuracy of a
BinaryClassificationMetrics ?
the multiClassMetrics has a precision() method but that is missing
on the BinaryClassificationMetrics
thanks
marco
too little info
it'll help if you can post the exception and show your sbt file (if you are
using sbt), and provide minimal details on what you are doing
kr
On Fri, Jun 17, 2016 at 10:08 AM, VG wrote:
> Failed to find data source: com.databricks.spark.xml
>
> Any suggestions
l")
> .option("rowTag", "row")
> .load("A.xml");
>
> Any suggestions please ..
>
>
>
>
> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
>
>> too little in
Hi
Post the code. I code in python and Scala on spark..I can give u help
though api for Scala and python are practically sameonly difference is
in the python lambda vs Scala inline functions
Hth
On 18 Jun 2016 6:27 am, "Aakash Basu" wrote:
> I don't have a sound
riba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>> Caused by:* java.lang.ClassNotFoundException:
>>>> scala.collection.GenTraversableOnce$class*
>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>> at java.lang.ClassLoader.loadClass(Clas
Hello all
could anyone help?
i have tried to install spark 1.6.0 on ubuntu, but the installation failed
Here are my steps
1. download spark (successful)
31 wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0.tgz
33 tar -zxf spark-1.6.0.tgz
2. cd spark-1.6.0
2.1 sbt assembly
error]
s FTP Client
>
> I assume that each ftp get is independent. *Maybe some one know more
> about how to control the amount of concurrency*. I think it will be based
> on the number of partitions, works, and cores?
>
> Andy
>
> From: Marco Mistroni <mmistr...@gmail.com>
> Date:
Hi
I m currently using spark to process a file containing a million of
rows(edgar quarterly filings files)
Each row contains some infos plus a location of a remote file which I need
to retrieve using FTP and then process it's content.
I want to do all 3 operations ( process filing file, fetch
hi all
i am trying to replicate the Streaming Wordcount example described here
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala
in my build,sbt i have the following dependencies
.
libraryDependencies +=
Have you tried SBT eclipse plugin? Then u can run SBT eclipse and have ur
spark project directly in eclipse
Pls Google it and u shud b able to find ur way.
If not ping me and I send u the plugin (I m replying from my phone)
Hth
On 12 Apr 2016 4:53 pm, "ImMr.K" <875061...@qq.com> wrote:
But how to
Have u tried df.saveAsParquetFIle? I think that method is on df Api
Hth
Marco
On 19 Mar 2016 7:18 pm, "Vincent Ohprecio" wrote:
>
> For some reason writing data from Spark shell to csv using the `csv
> package` takes almost an hour to dump to disk. Am I going crazy or did I
Hi
I try tomorrow same settings as you to see if I can experience same issues.
Will report back once done
Thanks
On 20 Mar 2016 3:50 pm, "Vincent Ohprecio" wrote:
> Thanks Mich and Marco for your help. I have created a ticket to look into
> it on dev channel.
> Here is the
Hi
U can use SBT assembly to create uber jar. U should set spark libraries as
'provided' in ur SBT
Hth
Marco
Ps apologies if by any chances I m telling u something u already know
On 4 Apr 2016 2:36 pm, "Mich Talebzadeh" wrote:
> Hi,
>
>
> When one builds a project for
If u r using Scala api you can do
Myrdd.zipwithindex.filter(_._2 >0).map(_._1)
Maybe a little bit complicated but will do the trick
As per spark CSV, you will get back a data frame which you can reconduct to
rdd. .
Hth
Marco
On 27 Apr 2016 6:59 am, "nihed mbarek" wrote:
> You
Hi
please share your build.sbt
here's mine for reference (using Spark 1.6.1 + scala 2.10) (pls ignore
extra stuff i have added for assembly and logging)
// Set the project name to the string 'My Project'
name := "SparkExamples"
// The := method used in Name and Version is one of two
HI all
i have a dataFrame with a column ("Age", type double) and i am trying to
create a new
column based on the value of the Age column, using Scala API
this code keeps on complaining
scala> df.withColumn("AgeInt", if (df("Age") > 29.0) lit(1) else lit(0))
:28: error: type mismatch;
found :
Hi
was wondering if anyone can assist here..
I am trying to create a spark cluster on AWS using scripts located in
spark-1.6.1/ec2 directory
When the spark_ec2.py scripts tries to do a rsync to copy directories over
to teh AWS
master node it fails miserably with this stack trace
DEBUG:spark ecd
hi all
i am experiencing issues when creating ec2 clusters using scripts in hte
spark\ec2 directory
i launched the following command
./spark-ec2 -k sparkkey -i sparkAccessKey.pem -r us-west2 -s 4 launch
MM-Cluster
My output is stuck with the following (has been for the last 20 minutes)
i
iew?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 21 April 2016 at 15:13, Marco Mistroni <mmistr...@gmail.com> wrote:
>
>> HI all
>> i need to use spark-csv in my spark instance, and i want to avoid
>> launching s
HI all
i need to use spark-csv in my spark instance, and i want to avoid
launching spark-shell
by passing the package name every time
I seem to remember that i need to amend a file in the /conf directory to
inlcude e,g
spark.packages com.databricks:spark-csv_2.11:1.4.0
but i cannot find
Column("AgeInt", when(col("age") > 29.0,
> 1).otherwise(0)).show
> +++--+
> | age|name|AgeInt|
> +++--+
> |25.0| foo| 0|
> |30.0| bar| 1|
> +++--+
>
> On Thu, 28 Apr 2016 at 20:45 Marco Mistroni <mmistr...@gmail.com> wrote:
ardhan shetty <janardhan...@gmail.com>
wrote:
> groupBy is a shuffle operation and index is already lost in this process
> if I am not wrong and don't see *sortWith* operation on RDD.
>
> Any suggestions or help ?
>
> On Mon, Jul 25, 2016 at 12:58 AM, Marco Mistroni <mmis
Hi Kevin
you should not need to rebuild everything.
Instead, i believe you should launch spark-submit by specifying the kafka
jar file in your --packages... i had to follow same when integrating spark
streaming with flume
have you checked this link ?
Hi
So u u have a data frame, then use zipwindex and create a tuple
I m not sure if df API has something useful for zip w index.
But u can
- get a data frame
- convert it to rdd (there's a tordd )
- do a zip with index
That will give u a rdd with 3 fields...
I don't think you can update df
le.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html
>
> plz help me.. I couldn't find any solution..plz
>
> On Fri, Jul 22, 2016 at 5:50 PM, Jean Georges Perrin <j...@jgp.net> wrote:
>
>> Thanks Marco - I like the idea of sticking with DataFrames ;)
..plz
>
> On Fri, Jul 22, 2016 at 5:50 PM, Jean Georges Perrin <j...@jgp.net> wrote:
>
>> Thanks Marco - I like the idea of sticking with DataFrames ;)
>>
>>
>> On Jul 22, 2016, at 7:07 AM, Marco Mistroni <mmistr...@gmail.com> wrote:
>>
>&g
54477, ...*))
> )
>
> As you can see after *groupbyKey* operation is complete item 18519 is in
> index 0 for ID1, index 2 for ID3 and index 16 for ID2 where as expected is
> index 0
>
>
> On Sun, Jul 24, 2016 at 12:43 PM, Marco Mistroni <mmistr...@gmail.com>
> w
HI all
i was wondering if anyone can help with this
I Have created a spark cluster before using spark_ec2.py script from Spark
1.6.1
that by default uses a very old AMI... so i decided to try to launch the
script with a more up to date
AMI.
the one i have used is ami-d732f0b7, which refers to
hi all
could anyone assist?
i need to create a udf function that returns a LabeledPoint
I read that in pyspark (1.6) LabeledPoint is not supported and i have to
create
a StructType
anyone can point me in some directions?
kr
marco
Apologies I misinterpreted could you post two use cases?
Kr
On 24 Jul 2016 3:41 pm, "janardhan shetty" <janardhan...@gmail.com> wrote:
> Marco,
>
> Thanks for the response. It is indexed order and not ascending or
> descending order.
> On Jul 24, 2016 7:
Hi
what is your source data? i am guessing a DataFrame or Integers as you
are usingan UDF
So your DataFrame is then a bunch of Row[Integer] ?
below a sample from one of my code to predict eurocup winners , going from
a DataFrame of Row[Double] to a RDD of LabeledPoint
I m not using UDF to
ts of ID1 with first five element of ID3
> next first 5 elements of ID1 to ID2. Similarly next 5 elements in that
> order until the end of number of elements.
> Let me know if this helps
>
>
> On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
>
Hi how bout creating an auto increment column in hbase?
Hth
On 24 Jul 2016 3:53 am, "yeshwanth kumar" wrote:
> Hi,
>
> i am doing bulk load to hbase using spark,
> in which i need to generate a sequential key for each record,
> the key should be sequential across all the
Hi jg
+1 for link. I'd add ML and graph examples if u can
-1 for programmign language choice :))
kr
On 31 Jul 2016 9:13 pm, "Jean Georges Perrin" wrote:
> Thanks Guys - I really appreciate :)... If you have any idea of something
> missing, I'll gladly add it.
>
> (and
How bout all dependencies? Presumably they will all go in --jars ?
What if I have 10 dependencies? Any best practices in packaging apps for
spark 2.0?
Kr
On 10 Aug 2016 6:46 pm, "Nick Pentreath" wrote:
> You're correct - Spark packaging has been shifted to not use the
Hi,
have you tried to use spark-csv (https://github.com/databricks/spark-csv)
? after all you can reconduct an XL file to CSV
hth.
On Thu, Jul 21, 2016 at 4:25 AM, Felix Cheung
wrote:
> From looking at be CLConnect package, its loadWorkbook() function only
>
Dr Mich
do you have any slides or videos available for the presentation you did
@Canary Wharf?
kindest regards
marco
On Wed, Jul 6, 2016 at 10:37 PM, Mich Talebzadeh
wrote:
> Dear forum members
>
> I will be presenting on the topic of "Running Spark on Hive or Hive
Hi
afaik yes (other pls override ). Generally, in RandomForest and
DecisionTree you have a column which you are trying to 'predict' (the
label) and a set of features that are used to predict the outcome.
i would assume that if you specify thelabel column and the 'features'
columns, everything
;
> However I have fixed this by making a fat jar using sbt assembly plugin.
>
> Now all the dependencies are included in that jar and I use that jar in
> spark-submit
>
> Thanks
> Sachin
>
>
> On Wed, Jul 20, 2016 at 9:42 PM, Marco Mistroni <mmistr...@g
Hello Sachin
pls paste the NoClassDefFound Exception so we can see what's failing,
aslo please advise how are you running your Spark App
For an extremely simple case, let's assume you have your MyFirstSparkApp
packaged in your myFirstSparkApp.jar
Then all you need to do would be to kick off
Hi Chen
pls post
1 . snippet code
2. exception
any particular reason why you need to load classes in other jars
programmatically?
Have you tried to build a fat jar with all the dependencies ?
hth
marco
On Thu, Jul 7, 2016 at 5:05 PM, Chen Song wrote:
> Sorry to spam
Hi vg I believe the error msg is misleading. I had a similar one with
pyspark yesterday after calling a count on a data frame, where the real
error was with an incorrect user defined function being applied .
Pls send me some sample code with a trimmed down version of the data and I
see if i can
Hello Jean
you can take ur current DataFrame and send them to mllib (i was doing that
coz i dindt know the ml package),but the process is littlebit cumbersome
1. go from DataFrame to Rdd of Rdd of [LabeledVectorPoint]
2. run your ML model
i'd suggest you stick to DataFrame + ml package :)
hth
Hi
Have u tried to sort the results before comparing?
On 2 Feb 2017 10:03 am, "Alex" wrote:
> Hi As shown below same query when ran back to back showing inconsistent
> results..
>
> testtable1 is Avro Serde table...
>
> [image: Inline image 1]
>
>
>
> hc.sql("select *
Hi
What is the UDF supposed to do? Are you trying to write a generic function
to convert values to another type depending on what is the type of the
original value?
Kr
On 1 Feb 2017 5:56 am, "Alex" wrote:
Hi ,
we have Java Hive UDFS which are working perfectly fine in
U can use EMR if u want to run. On a cluster
Kr
On 2 Feb 2017 12:30 pm, "Anahita Talebi" wrote:
> Dear all,
>
> I am trying to run a spark code on multiple machines using submit job in
> google cloud platform.
> As the inputs of my code, I have a training and
he spark connectors
> have the appropriate transitive dependency on the correct version.
>
> On Sat, Feb 4, 2017 at 3:25 PM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
> > Hi
> > not sure if this will help at all, and pls take it with a pinch of salt
>
Hi
not sure if this will help at all, and pls take it with a pinch of salt as
i dont have your setup and i am not running on a cluster
I have tried to run a kafka example which was originally workkign on spark
1.6.1 on spark 2.
These are the jars i am using
HI all
i am trying to run a sample spark code which reads streaming data from
Kafka
I Have followed instructions here
https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
Here's my setup
Spark: 2.0.1
Kafka:0.10.1.1
Scala Version: 2.11
Libraries used
-
HI all
i am trying to convert a string column, in a Dataframe , to a
java.util.Date but i am getting this exception
[dispatcher-event-loop-0] INFO org.apache.spark.storage.BlockManagerInfo -
Removed broadcast_0_piece0 on 169.254.2.140:53468 in memory (size: 14.3 KB,
free: 767.4 MB)
Exception
Hi
i am using sbt to generate ecliipse project file
these are my dependencies
they 'll probably translate to some thing like this in mvn dependencies
these are same for all packages listed below
org.apache,spark
2.1.0
spark-core_2.11
spark-streaming_2.11spark-mllib_2.11
spark-sql_2.11
Try to use --packages to include the jars. From error it seems it's looking
for main class in jars but u r running a python script...
On 25 Feb 2017 10:36 pm, "Raymond Xie" wrote:
That's right Anahita, however, the class name is not indicated in the
original github
UhmNot a SPK issueAnyway...Had similar issues with sbt
The quick sol. To get u going is to place ur dependency in your lib folder
The notsoquick is to build the sbt dependency and do a sbt publish-local,
or deploy local
But I consider both approaches hacks.
Hth
On 16 Jan 2017 2:00
in mongo url.
>
> I remember I tested with python successfully.
>
> Best Regards,
> Palash
>
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Tue, 17 Jan, 2017 at 5:37 am, Marco Mistroni
> <mmistr...@gmail
hi all
i have the folllowign snippet which loads a dataframe from a csv file and
tries to save
it to mongodb.
For some reason, the MongoSpark.save method raises the following exception
Exception in thread "main" java.lang.IllegalArgumentException: Missing
database name. Set via the
HI all
in searching on how to use Spark 2.0 with mongo i came across this link
https://jira.mongodb.org/browse/SPARK-20
i amended my build.sbt (content below), however the mongodb dependency was
not found
Could anyone assist?
kr
marco
name := "SparkExamples"
version := "1.0"
scalaVersion :=
sorry. should have done more research before jumping to the list
the version of the connector is 2.0.0, available from maven repors
sorry
On Mon, Jan 16, 2017 at 9:32 PM, Marco Mistroni <mmistr...@gmail.com> wrote:
> HI all
> in searching on how to use Spark 2.0 with mongo i
t.uri",
"mongodb://localhost:27017/test.tree"))
kr
marco
On Tue, Jan 17, 2017 at 7:53 AM, Marco Mistroni <mmistr...@gmail.com> wrote:
> Uh. Many thanksWill try it out
>
> On 17 Jan 2017 6:47 am, "Palash Gupta" <spline_pal...@yahoo.com> wrote:
>
ng Spark in standalone mode.
>
> Regards
>
>
> ---- Original message
> From: Marco Mistroni
> Date:15/01/2017 16:34 (GMT+02:00)
> To: User
> Subject: Running Spark on EMR
>
> hi all
> could anyone assist here?
> i am trying to run spark 2.0.0 on an EMR c
Or place the file in s3 and provide the s3 path
Kr
On 28 Feb 2017 1:18 am, "Yunjie Ji" wrote:
> After start the dfs, yarn and spark, I run these code under the root
> directory of spark on my master host:
> `MASTER=yarn ./bin/run-example ml.LogisticRegressionExample
>
similar setup can be used on Linux)
https://spark.apache.org/docs/latest/streaming-kafka-integration.html
kr
On Sat, Feb 25, 2017 at 11:12 PM, Marco Mistroni <mmistr...@gmail.com>
wrote:
> Hi I have a look. At GitHub project tomorrow and let u know. U have a py
> scripts to run and
This exception coming from a Spark program?
could you share few lines of code ?
kr
marco
On Tue, Feb 28, 2017 at 10:23 PM, shyla deshpande
wrote:
> producer send callback exception:
> org.apache.kafka.common.errors.TimeoutException:
> Expiring 1 record(s) for
Hi I think u need an UDF if u want to transform a column
Hth
On 1 Mar 2017 4:22 pm, "Bill Schwanitz" wrote:
> Hi all,
>
> I'm fairly new to spark and scala so bear with me.
>
> I'm working with a dataset containing a set of column / fields. The data
> is stored in hdfs as
hi all
i am getting failures when building spark 2.0 on Ubuntu 16.06
Here's details of what i have installed on the ubuntu host
- java 8
- scala 2.11
- git
When i launch the command
./build/mvn -Pyarn -Phadoop-2.7 -DskipTests clean package
everything compiles sort of fine and at the end i
Hi
please paste the exception
for Spark vs Jupyter, you might want to sign up for this.
It'll give you jupyter and spark...and presumably the spark-csv is already
part of it ?
https://community.cloud.databricks.com/login.html
hth
marco
On Sat, Sep 3, 2016 at 8:10 PM, Arif,Mubaraka
Pls paste code and sample CSV
I m guessing it has to do with formatting time?
Kr
On 8 Sep 2016 12:38 am, "Daniel Lopes" wrote:
> Hi,
>
> I'm* importing a few CSV*s with spark-csv package,
> Always when I give a select at each one looks ok
> But when i join then with
--+--------++-
> ---+--+++--+
> -+--+
>
> -
>
> *Daniel Lopes*
> Chief Data and Analytics Officer | OneMatch
> c: +55 (18) 99764-2733 | https
Not enough info. But u can try same code in spark shell and get hold of the
exception
Hth
On 8 Sep 2016 11:16 am, "Divya Gehlot" wrote:
> Hi,
> I am on Spark 1.6.1
> I am getting below error when I am trying to call UDF in my spark
> Dataframe column
> UDF
> /* get the
hi all
i have been toying around with this well known RandomForestExample code
val forest = RandomForest.trainClassifier(
trainData, 7, Map(10 -> 4, 11 -> 40), 20,
"auto", "entropy", 30, 300)
This comes from this link (
it?
>
> Sean
>
>
> On Wed, Sep 14, 2016 at 10:18 PM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
> > hi all
> > i have been toying around with this well known RandomForestExample code
> >
> > val forest = RandomForest.trainClassifier(
> > trainD
Hi Vr
your code works fine for me, running on Windows 10 vs Spark 1.6.1
i m guessing your Spark installation could be busted?
That would explain why it works on your IDE, as you are just importing jars
in your project.
The java.io.IOException: Failed to connect to error is misleading, i have
ns from Java 7 to Java 8 is to use
> the scripts build/mvn and build/sbt, which should be updated on a regular
> basis with safe JVM options.
>
> Fred
>
> On Wed, Oct 5, 2016 at 1:40 AM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
>
>> Thanks Richard. It a
gt; ./dev/change-scala-version.sh 2.10 ./build/mvn -Pyarn -Phadoop-2.4
>>> -Dscala-2.10 -DskipTests clean package
>>> If you're building with scala 2.10
>>>
>>> On Sat, Aug 27, 2016, 00:18 Marco Mistroni <mmistr...@gmail.com> wrote:
>>>
>>>
all good. Tal's suggestion did it. i shud have read the manual first :(
tx for assistance
On Sat, Aug 27, 2016 at 9:06 AM, Marco Mistroni <mmistr...@gmail.com> wrote:
> Thanks, i'll follow advice and try again
>
> kr
> marco
>
> On Sat, Aug 27, 2016 at
Aug 26, 2016 at 6:18 PM, Michael Gummelt <mgumm...@mesosphere.io>
wrote:
> :)
>
> On Thu, Aug 25, 2016 at 2:29 PM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
>
>> No i wont accept that :)
>> I can't believe i have wasted 3 hrs for a space!
>>
>&
many tx Jestin!
On Thu, Aug 25, 2016 at 10:13 PM, Jestin Ma <jestinwith.a...@gmail.com>
wrote:
> How about this:
>
> df.withColumn("doubles", col("ints").cast("double")).drop("ints")
>
> On Thu, Aug 25, 2016 at 2:09 PM, Marco Mi
No i wont accept that :)
I can't believe i have wasted 3 hrs for a space!
Many thanks MIchael!
kr
On Thu, Aug 25, 2016 at 10:01 PM, Michael Gummelt <mgumm...@mesosphere.io>
wrote:
> You have a space between "build" and "mvn"
>
> On Thu, Aug 25, 2016
hi all
i might be stuck in old code, but this is what i am doing to convert a DF
int column to Double
val intToDoubleFunc:(Int => Double) = lbl => lbl.toDouble
val labelToDblFunc = udf(intToDoubleFunc)
val convertedDF = df.withColumn("SurvivedDbl",
labelToDblFunc(col("Survived")))
is there a
HI all
sorry for the partially off-topic, i hope there's someone on the list who
has tried the same and encountered similar issuse
Ok so i have created a Docker file to build an ubuntu container which
inlcudes spark 2.0, but somehow when it gets to the point where it has to
kick off ./build/mvn
Hi Dr Mich,
how bout reading all csv as string and then applying an UDF sort of like
this?
import scala.util.control.Exception.allCatch
def getDouble(doubleStr:String):Double =
allCatch opt doubleStr.toDouble match {
case Some(doubleNum) => doubleNum
case _ => Double.NaN
}
t; Try shutting down zinc. Something's funny about your compile server.
> It's not required anyway.
>
> On Sat, Oct 1, 2016 at 3:24 PM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
> > Hi guys
> > sorry to annoy you on this but i am getting nowhere. So far i
Hi
not sure if spark-csv supports the http:// format you use to load data
from the WEB. I just tried this and got exception
scala> val df = sqlContext.read.
| format("com.databricks.spark.csv").
| option("inferSchema", "true").
|
Hi
i must admit , i had issues as well in finding a sample that does that,
(hopefully Spark folks can add more examples or someone on the list can
post a sample code?)
hopefully you can reuse sample below
So, you start from an rdd of doubles (myRdd)
## make a row
val toRddOfRows =
Hi
in fact i have just found some written notes in my code see if this
docs help you (it will work with any spark versions, not only 1.3.0)
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#creating-dataframes
hth
On Sun, Sep 25, 2016 at 1:25 PM, Marco Mistroni <mmi
ark
git) that i am using somehow wrong parameters. or perhaps i should
install scala 2.11 before i install spark? or Maven ?
kr
marco
On Fri, Sep 30, 2016 at 8:23 PM, Marco Mistroni <mmistr...@gmail.com> wrote:
> Hi all
> this problem is still bothering me.
> Here's my setu
)
at
scala_maven.ScalaTestCompileMojo.execute(ScalaTestCompileMojo.java:48)
at
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
anyone found a similar error?
kr
On Sat, Sep 3, 2016 at 2:54 PM, Marco Mistroni <mmistr...@gmail.com>
Hi
pickle erros normally point to serialisation issue. i am suspecting
something wrong with ur S3 data , but is just a wild guess...
Is your s3 object publicly available?
few suggestions to nail down the problem
1 - try to see if you can read your object from s3 using boto3 library
'offline',
ppend(object.key)
>
> print("object key")
> print (s3_list[0])
>
> s3obj = boto3.resource('s3').Object(bucket_name='time-waits-for-no-man',
> key=s3_list[0])
> contents = s3obj.get()['Body'].read().decode()
> meow = contents.splitlines()
> result_wo_timestamp = map(ujson.l
1 - 100 of 171 matches
Mail list logo