RE: Creating Partitioned Parquet Tables via SparkSQL

2015-04-01 Thread Felix Cheung
This is tracked by these JIRAs.. https://issues.apache.org/jira/browse/SPARK-5947 https://issues.apache.org/jira/browse/SPARK-5948 From: denny.g@gmail.com Date: Wed, 1 Apr 2015 04:35:08 + Subject: Creating Partitioned Parquet Tables via SparkSQL To: user@spark.apache.org Creating

RE: Streaming anomaly detection using ARIMA

2015-04-01 Thread Felix Cheung
I'm curious - I'm not sure if I understand you correctly. With SparkR, the work is distributed in Spark and computed in R, isn't that what your are looking for? SparkR was on rJava for the R-JVM but moved away from it. rJava has a component called JRI which allows JVM to call R. You could call

RE: SparkR csv without headers

2015-08-21 Thread Felix Cheung
You could also rename them with names Unfortunately the API doesn't show the example of that https://spark.apache.org/docs/latest/api/R/index.html On Thu, Aug 20, 2015 at 7:43 PM -0700, Sun, Rui rui@intel.com wrote: Hi, You can create a DataFrame using load.df() with a specified schema.

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Felix Cheung
This might be related to https://issues.apache.org/jira/browse/SPARK-10500 On Sun, Oct 25, 2015 at 9:57 AM -0700, "Ted Yu" wrote: In zipRLibraries(): // create a zip file from scratch, do not append to existing file. val zipFile = new File(dir, name) I guess

RE: HiveContext ignores ("skip.header.line.count"="1")

2015-10-26 Thread Felix Cheung
Please open a JIRA? Date: Mon, 26 Oct 2015 15:32:42 +0200 Subject: HiveContext ignores ("skip.header.line.count"="1") From: daniel.ha...@veracity-group.com To: user@spark.apache.org Hi,I have a csv table in Hive which is configured to skip the header row using

Re: thought experiment: use spark ML to real time prediction

2015-11-12 Thread Felix Cheung
+1 on that. It would be useful to use the model outside of Spark. _ From: DB Tsai Sent: Wednesday, November 11, 2015 11:57 PM Subject: Re: thought experiment: use spark ML to real time prediction To: Nirmal Fernando Cc: Andy

Re: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-10-06 Thread Felix Cheung
Is it possible that your user does not have permission to write temp file? On Tue, Oct 6, 2015 at 10:26 AM -0700, "akhandeshi" wrote: It seems it is failing at path <- tempfile(pattern = "backend_port") I do not see backend_port directory created... -- View

RE: SparkR: exported functions

2015-08-26 Thread Felix Cheung
I believe that is done explicitly while the final API is being figured out. For the moment you could use DataFrame read.df() From: csgilles...@gmail.com Date: Tue, 25 Aug 2015 18:26:50 +0100 Subject: SparkR: exported functions To: user@spark.apache.org Hi, I've just started playing

RE: possible bug spark/python/pyspark/rdd.py portable_hash()

2015-11-27 Thread Felix Cheung
May I ask how you are starting Spark? It looks like PYTHONHASHSEED is being set: https://github.com/apache/spark/search?utf8=%E2%9C%93=PYTHONHASHSEED Date: Thu, 26 Nov 2015 11:30:09 -0800 Subject: possible bug spark/python/pyspark/rdd.py portable_hash() From: a...@santacruzintegration.com To:

Re: SparkR read.df failed to read file from local directory

2015-12-08 Thread Felix Cheung
Have you tried flightsDF <- read.df(sqlContext, "/home/myuser/test_data/sparkR/flights.csv", source = "com.databricks.spark.csv", header = "true")     _ From: Boyu Zhang Sent: Tuesday, December 8, 2015 8:47 AM Subject: SparkR read.df

RE: possible bug spark/python/pyspark/rdd.py portable_hash()

2015-11-29 Thread Felix Cheung
--executor-memory 2G \ $extraPkgs \ $* From: Felix Cheung <felixcheun...@hotmail.com> Date: Saturday, November 28, 2015 at 12:11 AM To: Ted Yu <yuzhih...@gmail.com> Cc: Andrew Davidson <a...@santacruzintegration.com>, "user @spark" <user@spark.apache.org>

Re: Python API Documentation Mismatch

2015-12-03 Thread Felix Cheung
Please open an issue in JIRA, thanks! On Thu, Dec 3, 2015 at 3:03 AM -0800, "Roberto Pagliari" wrote: Hello, I believe there is a mismatch between the API documentation (1.5.2) and the software currently available. Not all functions mentioned here

Re: SparkR in Spark 1.5.2 jsonFile Bug Found

2015-12-03 Thread Felix Cheung
It looks like this has been broken around Spark 1.5. Please see JIRA SPARK-10185. This has been fixed in pyspark but unfortunately SparkR was missed. I have confirmed this is still broken in Spark 1.6. Could you please open a JIRA? On Thu, Dec 3, 2015 at 2:08 PM -0800, "tomasr3"

Re: sparkR ORC support.

2016-01-06 Thread Felix Cheung
pi.r.SQLUtils", "loadDF", sqlContext, > source, options) > 2 > read.df(sqlContext, filepath, "orc") at > spark_api.R#108 > > On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <felixcheun...@hotmail.com> > wrote: > >> Firstly I don't have ORC data t

Re: pyspark Dataframe and histogram through ggplot (python)

2016-01-05 Thread Felix Cheung
Hi, select() returns a new Spark DataFrame; I would imagine ggplot would not work with it. Could you try df.select("something").toPandas()? _ From: Snehotosh Banerjee Sent: Tuesday, January 5, 2016 4:32 AM Subject: pyspark Dataframe

Re: sparkR ORC support.

2016-01-05 Thread Felix Cheung
Firstly I don't have ORC data to verify but this should work: df <- loadDF(sqlContext, "data/path", "orc") Secondly, could you check if sparkR.stop() was called? sparkRHive.init() should be called after sparkR.init() - please check if there is any error message there.

Re: Do existing R packages work with SparkR data frames

2015-12-23 Thread Felix Cheung
Hi SparkR has some support for machine learning algorithm like glm. For existing R packages, currently you would need to collect to convert into R data.frame - assuming it fits into the memory of the driver node, though that would be required to work with R package in any case.

Re: number of executors in sparkR.init()

2015-12-25 Thread Felix Cheung
The equivalent for spark-submit --num-executors should be  spark.executor.instancesWhen use in SparkConf?http://spark.apache.org/docs/latest/running-on-yarn.html Could you try setting that with sparkR.init()? _ From: Franc Carter Sent:

Re: how to use sparkR or spark MLlib load csv file on hdfs then calculate covariance

2015-12-28 Thread Felix Cheung
Make sure you add the csv spark package as this example here so that the source parameter in R read.df would work: https://spark.apache.org/docs/latest/sparkr.html#from-data-sources _ From: Andy Davidson Sent: Monday, December 28,

Re: possible bug spark/python/pyspark/rdd.py portable_hash()

2015-11-28 Thread Felix Cheung
Ah, it's there in spark-submit and pyspark.Seems like it should be added for spark_ec2 _ From: Ted Yu <yuzhih...@gmail.com> Sent: Friday, November 27, 2015 11:50 AM Subject: Re: possible bug spark/python/pyspark/rdd.py portable_hash() To: Felix

RE: sparkR ORC support.

2016-01-12 Thread Felix Cheung
c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))library(SparkR) sc <<- sparkR.init()sc <<- sparkRHive.init()hivecontext <<- sparkRHive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")#View(df)

Re: sparkR ORC support.

2016-01-12 Thread Felix Cheung
would need to call the line hivecontext <- sparkRHive.init(sc) again. _ From: Sandeep Khurana <sand...@infoworks.io> Sent: Tuesday, January 12, 2016 5:20 AM Subject: Re: sparkR ORC support. To: Felix Cheung <felixcheun...@hotmail.com> Cc: spark users

RE: GraphX Java API

2016-06-08 Thread Felix Cheung
You might want to check out GraphFrames graphframes.github.io On Sun, Jun 5, 2016 at 6:40 PM -0700, "Santoshakhilesh" wrote: Ok , thanks for letting me know. Yes Since Java and scala programs ultimately runs on JVM. So the APIs written in one language can

RE: SparkContext SyntaxError: invalid syntax

2016-01-17 Thread Felix Cheung
Do you still need help on the PR? btw, does this apply to YARN client mode? From: andrewweiner2...@u.northwestern.edu Date: Sun, 17 Jan 2016 17:00:39 -0600 Subject: Re: SparkContext SyntaxError: invalid syntax To: cutl...@gmail.com CC: user@spark.apache.org Yeah, I do think it would be worth

Re: NA value handling in sparkR

2016-01-27 Thread Felix Cheung
That's correct - and because spark-csv as Spark package is not specifically aware of R's notion of  NA and interprets it as a string value. On the other hand, R native NA is converted to NULL on Spark when creating a Spark DataFrame from a R data.frame. 

Re: SparkR with Hive integration

2016-01-19 Thread Felix Cheung
You might need hive-site.xml _ From: Peter Zhang Sent: Monday, January 18, 2016 9:08 PM Subject: Re: SparkR with Hive integration To: Jeff Zhang Cc: Thanks,  I will try.

Re: SparkContext SyntaxError: invalid syntax

2016-01-19 Thread Felix Cheung
roperty for setting environment variables. On Sun, Jan 17, 2016 at 11:37 PM, Felix Cheung <felixcheun...@hotmail.com> wrote: > Do you still need help on the PR? > btw, does this apply to YARN client mode? > > -- > From: andrewweiner2...@u.northweste

Re: cannot coerce class "data.frame" to a DataFrame - with spark R

2016-02-18 Thread Felix Cheung
Doesn't DESeqDataSetFromMatrix work with data.frame only? It wouldn't work with Spark's DataFrame - try collect(countMat) and others to convert them into data.frame? _ From: roni Sent: Thursday, February 18, 2016 4:55 PM Subject: cannot

Re: installing packages with pyspark

2016-03-19 Thread Felix Cheung
yarn --packages graphframes:graphframes:0.1.0-spark1.5 which starts and gives me a REPL, but when I try from graphframes import * I get No module names graphframes without '--master yarn' it works as expected thanks On 18 March 2016 at 12:59, Felix Cheung <felixcheun...@hotmail.com> wr

Re: installing packages with pyspark

2016-03-19 Thread Felix Cheung
For some, like graphframes that are Spark packages, you could also use --packages in the command line of spark-submit or pyspark. Seehttp://spark.apache.org/docs/latest/submitting-applications.html _ From: Jakob Odersky Sent: Thursday, March

Re: GraphFrames and IPython notebook issue - No module named graphframes

2016-04-30 Thread Felix Cheung
Please see http://stackoverflow.com/questions/36397136/importing-pyspark-packages On Mon, Apr 25, 2016 at 2:39 AM -0700, "Camelia Elena Ciolac" wrote: Hello, I work locally on my laptop, not using DataBricks Community edition. I downloaded

Re: XLConnect in SparkR

2016-07-20 Thread Felix Cheung
>From looking at be CLConnect package, its loadWorkbook() function only >supports reading from local file path, so you might need a way to call HDFS >command to get the file from HDFS first. SparkR currently does not support this - you could read it in as a text file (I don't think .xlsx is a

Re: Graphframe Error

2016-07-05 Thread Felix Cheung
n "Spark 1.6 pre-built for Hadoop" version. I am still not able to get it working. Not sure what I am missing. Attaching the logs. On Mon, Jul 4, 2016 at 5:33 AM, Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote: It looks like e

Re: Graphframe Error

2016-07-08 Thread Felix Cheung
I ran it with Python 2. On Thu, Jul 7, 2016 at 4:13 AM -0700, "Arun Patel" <arunp.bigd...@gmail.com<mailto:arunp.bigd...@gmail.com>> wrote: I have tied this already. It does not work. What version of Python is needed for this package? On Wed, Jul 6, 2016 at

Re: SparkR error when repartition is called

2016-08-09 Thread Felix Cheung
I think it's saying a string isn't being sent properly from the JVM side. Does it work for you if you change the dapply UDF to something simpler? Do you have any log from YARN? _ From: Shane Lee >

Re: Graphframe Error

2016-07-04 Thread Felix Cheung
It looks like either the extracted Python code is corrupted or there is a mismatch Python version. Are you using Python 3? stackoverflow.com/questions/514371/whats-the-bad-magic-number-error On Mon, Jul 4, 2016 at

Re: UDF in SparkR

2016-08-17 Thread Felix Cheung
This is supported in Spark 2.0.0 as dapply and gapply. Please see the API doc: https://spark.apache.org/docs/2.0.0/api/R/ Feedback welcome and appreciated! _ From: Yogesh Vyas > Sent: Tuesday, August 16, 2016 11:39 PM

Re: Examples in graphx

2017-01-29 Thread Felix Cheung
Which graph do you are thinking about? Here's one for neo4j https://neo4j.com/blog/neo4j-3-0-apache-spark-connector/ From: Deepak Sharma Sent: Sunday, January 29, 2017 4:28:19 AM To: spark users Subject: Examples in graphx Hi There, Are

Re: Getting exit code of pipe()

2017-02-12 Thread Felix Cheung
Subject: Re: Getting exit code of pipe() To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> Cc: <user@spark.apache.org<mailto:user@spark.apache.org>> Cool that's exactly what I was looking for! Thanks! How does one output the status into

Re: Getting exit code of pipe()

2017-02-11 Thread Felix Cheung
Do you want the job to fail if there is an error exit code? You could set checkCode to True spark.apache.org/docs/latest/api/python/pyspark.html?highlight=pipe#pyspark.RDD.pipe Otherwise maybe you want

Re: what does dapply actually do?

2017-01-18 Thread Felix Cheung
With Spark, the processing is performed lazily. This means nothing much is really happening until you call an "action" - an example that is collect(). Another way is to write the output in a distributed manner - see write.df() in R. With SparkR dapply() passing the data from Spark to R to

Re: Creating UUID using SparksSQL

2017-01-18 Thread Felix Cheung
spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.functions.monotonically_increasing_id ? From: Ninad Shringarpure

Re: pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp

2016-08-18 Thread Felix Cheung
Re: pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>, user @spark <user@spark.apache.org<mailto:u

Re: pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp

2016-08-18 Thread Felix Cheung
Do you have a file called tmp at / on HDFS? On Thu, Aug 18, 2016 at 2:57 PM -0700, "Andy Davidson" > wrote: For unknown reason I can not create UDF when I run the attached notebook on my cluster. I get the following error

Re: Best way to read XML data from RDD

2016-08-19 Thread Felix Cheung
Have you tried https://github.com/databricks/spark-xml ? On Fri, Aug 19, 2016 at 1:07 PM -0700, "Diwakar Dhanuskodi" > wrote: Hi, There is a RDD with json data. I could read json data using rdd.read.json . The json data has

Re: Best way to read XML data from RDD

2016-08-19 Thread Felix Cheung
way to read XML data from RDD To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>, user <user@spark.apache.org<mailto:user@spark.apache.org>> Yes . It accepts a xml file as source but not RDD. The XML data embedded inside json is streamed

Re: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big")

2016-08-25 Thread Felix Cheung
Cinquegrana, Piero <piero.cinquegr...@neustar.biz<mailto:piero.cinquegr...@neustar.biz>> Sent: Wednesday, August 24, 2016 10:37 AM Subject: RE: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big") To: Cinquegrana, Piero <piero.cinquegr...@neustar.biz<mailto:

Re: PySpark: preference for Python 2.7 or Python 3.5?

2016-09-02 Thread Felix Cheung
There is an Anaconda parcel one could readily install on CDH https://docs.continuum.io/anaconda/cloudera As Sean says it is Python 2.7.x. Spark should work for both 2.7 and 3.5. _ From: Sean Owen > Sent: Friday,

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-10 Thread Felix Cheung
You should be able to get it to work with 2.0 as uber jar. What type cluster you are running on? YARN? And what distribution? On Sun, Sep 4, 2016 at 8:48 PM -0700, "Holden Karau" > wrote: You really shouldn't mix different versions of Spark

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Felix Cheung
How are you calling dirs()? What would be x? Is dat a SparkDataFrame? With SparkR, i in dat[i, 4] should be an logical expression for row, eg. df[df$age %in% c(19, 30), 1:2] On Sat, Sep 10, 2016 at 11:02 AM -0700, "Bene" >

Re: Assign values to existing column in SparkR

2016-09-10 Thread Felix Cheung
If you are to set a column to 0 (essentially remove and replace the existing one) you would need to put a column on the right hand side: > df <- as.DataFrame(iris) > head(df) Sepal_Length Sepal_Width Petal_Length Petal_Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2

Re: questions about using dapply

2016-09-10 Thread Felix Cheung
You might need MARGIN capitalized, this example works though: c <- as.DataFrame(cars) # rename the columns to c1, c2 c <- selectExpr(c, "speed as c1", "dist as c2") cols_in <- dapplyCollect(c, function(x) {apply(x[, paste("c", 1:2, sep = "")], MARGIN=2, FUN = function(y){ y %in% c(61, 99)})}) #

Re: SparkR error: reference is ambiguous.

2016-09-10 Thread Felix Cheung
Could you provide more information on how df in your example is created? Also please include the output from printSchema(df)? This example works: > c <- createDataFrame(cars) > c SparkDataFrame[speed:double, dist:double] > c$speed <- c$dist*0 > c SparkDataFrame[speed:double, dist:double] >

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Felix Cheung
Could you include code snippets you are running? On Sat, Sep 10, 2016 at 1:44 AM -0700, "Bene" > wrote: Hi, I am having a problem with the SparkR API. I need to subset a distributed data so I can extract single values from

Re: Is Spark 2.0 master node compatible with Spark 1.5 work node?

2016-09-18 Thread Felix Cheung
ink a 2.0 uber jar will play nicely on a 1.5 standalone cluster. On Saturday, September 10, 2016, Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote: You should be able to get it to work with 2.0 as uber jar. What type cluster you are running on? YARN? An

Re: No SparkR on Mesos?

2016-09-07 Thread Felix Cheung
This is correct - SparkR is not quite working completely on Mesos. JIRAs and contributions welcome! On Wed, Sep 7, 2016 at 10:21 AM -0700, "Michael Gummelt" > wrote: Quite possibly. I've never used it. I know Python was "unsupported"

Re: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big")

2016-08-25 Thread Felix Cheung
parkR: Error in writeBin(batch, con, endian = "big") To: <user@spark.apache.org<mailto:user@spark.apache.org>>, Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> I tested both in local and cluster mode and the ‘<<-‘ seemed to work

Re: Disable logger in SparkR

2016-08-22 Thread Felix Cheung
You should be able to do that with log4j.properties http://spark.apache.org/docs/latest/configuration.html#configuring-logging Or programmatically https://spark.apache.org/docs/2.0.0/api/R/setLogLevel.html _ From: Yogesh Vyas

Re: spark.lapply in SparkR: Error in writeBin(batch, con, endian = "big")

2016-08-22 Thread Felix Cheung
How big is the output from score()? Also could you elaborate on what you want to broadcast? On Mon, Aug 22, 2016 at 11:58 AM -0700, "Cinquegrana, Piero" > wrote: Hello, I am using the new R API in SparkR spark.lapply

Re: Issue Running sparkR on YARN

2016-11-09 Thread Felix Cheung
It maybe the Spark executor is running as a different user and it can't see where RScript is? You might want to try putting Rscript path to PATH. Also please see this for the config property to set for the R command to use: https://spark.apache.org/docs/latest/configuration.html#sparkr

Re: Strongly Connected Components

2016-11-10 Thread Felix Cheung
It is possible it is dead. Could you check the Spark UI to see if there is any progress? _ From: Shreya Agarwal > Sent: Thursday, November 10, 2016 12:45 AM Subject: RE: Strongly Connected Components To:

Re: Substitute Certain Rows a data Frame using SparkR

2016-10-19 Thread Felix Cheung
It's a bit less concise but this works: > a <- as.DataFrame(cars) > head(a) speed dist 1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10 > b <- withColumn(a, "speed", ifelse(a$speed > 15, a$speed, 3)) > head(b) speed dist 1 3 2 2 3 10 3 3 4 4 3 22 5 3 16 6 3 10 I think your example could be something

Re: How to propagate R_LIBS to sparkr executors

2016-11-17 Thread Felix Cheung
Have you tried spark.executorEnv.R_LIBS? spark.apache.org/docs/latest/configuration.html#runtime-environment _ From: Rodrick Brown > Sent: Wednesday, November 16, 2016 1:01 PM Subject: How to propagate R_LIBS to

Re: Spark Dataframe: Save to hdfs is taking long time

2016-12-15 Thread Felix Cheung
What is the format? From: KhajaAsmath Mohammed Sent: Thursday, December 15, 2016 7:54:27 PM To: user @spark Subject: Spark Dataframe: Save to hdfs is taking long time Hi, I am using issue while saving the dataframe back to HDFS. It's

Re: How to load edge with properties file useing GraphX

2016-12-15 Thread Felix Cheung
Have you checked out https://github.com/graphframes/graphframes? It might be easier to work with DataFrame. From: zjp_j...@163.com Sent: Thursday, December 15, 2016 7:23:57 PM To: user Subject: How to load edge with properties file useing

Re: [GraphFrame, Pyspark] Weighted Edge in PageRank

2016-12-01 Thread Felix Cheung
That's correct - currently GraphFrame does not compute PageRank with weighted edges. _ From: Weiwei Zhang > Sent: Thursday, December 1, 2016 2:41 PM Subject: [GraphFrame, Pyspark] Weighted Edge in PageRank To:

Re: PySpark to remote cluster

2016-11-30 Thread Felix Cheung
Spark 2.0.1 is running with a different py4j library than Spark 1.6. You will probably run into other problems mixing versions though - is there a reason you can't run Spark 1.6 on the client? _ From: Klaus Schaefers

Re: Spark GraphFrame ConnectedComponents

2017-01-05 Thread Felix Cheung
uary 5, 2017 10:05:03 AM To: Felix Cheung Cc: user@spark.apache.org Subject: Re: Spark GraphFrame ConnectedComponents Yes it works to read the vertices and edges data from S3 location and is also able to write the checkpoint files to S3. It only fails when deleting the data and that is because it

Re: Spark GraphFrame ConnectedComponents

2017-01-05 Thread Felix Cheung
. From: Ankur Srivastava <ankur.srivast...@gmail.com> Sent: Thursday, January 5, 2017 3:45:59 PM To: Felix Cheung; d...@spark.apache.org Cc: user@spark.apache.org Subject: Re: Spark GraphFrame ConnectedComponents Adding DEV mailing list to see if this is a defect with ConnectedCom

Re: Spark GraphFrame ConnectedComponents

2017-01-04 Thread Felix Cheung
Do you have more of the exception stack? From: Ankur Srivastava Sent: Wednesday, January 4, 2017 4:40:02 PM To: user@spark.apache.org Subject: Spark GraphFrame ConnectedComponents Hi, I am trying to use the ConnectedComponent

Re: Spark GraphFrame ConnectedComponents

2017-01-05 Thread Felix Cheung
nkur.srivast...@gmail.com>> Sent: Wednesday, January 4, 2017 9:23 PM Subject: Re: Spark GraphFrame ConnectedComponents To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> Cc: <user@spark.apache.org<mailto:user@spark.apache.org>> This is the exact trace

Re: Issue with SparkR setup on RStudio

2016-12-29 Thread Felix Cheung
Any reason you are setting HADOOP_HOME? >From the error it seems you are running into issue with Hive config likely >with trying to load hive-site.xml. Could you try not setting HADOOP_HOME From: Md. Rezaul Karim Sent:

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Felix Cheung
Have you tried the spark-csv package? https://spark-packages.org/package/databricks/spark-csv From: Raymond Xie Sent: Friday, December 30, 2016 6:46:11 PM To: user@spark.apache.org Subject: How to load a big csv to dataframe in Spark 1.6

Re: Spark Graphx with Database

2016-12-30 Thread Felix Cheung
You might want to check out GraphFrames - to load database data (as Spark DataFrame) and build graphs with them https://github.com/graphframes/graphframes _ From: balaji9058 > Sent: Monday, December 26, 2016 9:27 PM

Re: Difference in R and Spark Output

2016-12-30 Thread Felix Cheung
Could you elaborate more on the huge difference you are seeing? From: Saroj C Sent: Friday, December 30, 2016 5:12:04 AM To: User Subject: Difference in R and Spark Output Dear All, For the attached input file, there is a huge difference

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-31 Thread Felix Cheung
ect: Re: How to load a big csv to dataframe in Spark 1.6 To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> Cc: <user@spark.apache.org<mailto:user@spark.apache.org>> Hello Felix, I followed the instruction and ran the command: >

Re: Issue with SparkR setup on RStudio

2017-01-02 Thread Felix Cheung
is not set in the Windows tests. _ From: Md. Rezaul Karim <rezaul.ka...@insight-centre.org<mailto:rezaul.ka...@insight-centre.org>> Sent: Monday, January 2, 2017 7:58 AM Subject: Re: Issue with SparkR setup on RStudio To: Felix Cheung <felixcheun...@hotm

Re: GraphFrame not init vertices when load edges

2016-12-18 Thread Felix Cheung
Or this is a better link: http://graphframes.github.io/quick-start.html _ From: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> Sent: Sunday, December 18, 2016 8:46 PM Subject: Re: GraphFrame not init vertices when load edge

Re: GraphFrame not init vertices when load edges

2016-12-18 Thread Felix Cheung
Can you clarify? Vertices should be another DataFrame as you can see in the example here: https://github.com/graphframes/graphframes/blob/master/docs/quick-start.md From: zjp_j...@163.com Sent: Sunday, December 18, 2016 6:25:50 PM To: user

Re: GraphFrame not init vertices when load edges

2016-12-18 Thread Felix Cheung
There is not a GraphLoader for GraphFrames but you could load and convert from GraphX: http://graphframes.github.io/user-guide.html#graphx-to-graphframe From: zjp_j...@163.com <zjp_j...@163.com> Sent: Sunday, December 18, 2016 9:39:49 PM To: Felix Cheung

Re: Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-02 Thread Felix Cheung
Interesting! From: Robert Yokota Sent: Sunday, April 2, 2017 9:40:07 AM To: user@spark.apache.org Subject: Graph Analytics on HBase with HGraphDB and Spark GraphFrames Hi, In case anyone is interested in analyzing graphs in HBase with Apache

Re: Spark SQL - Global Temporary View is not behaving as expected

2017-04-22 Thread Felix Cheung
Cross session is this context is multiple spark sessions from the same spark context. Since you are running two shells, you are having different spark context. Do you have to you a temp view? Could you create a table? _ From: Hemanth Gudela

Re: [sparkR] [MLlib] : Is word2vec implemented in SparkR MLlib ?

2017-04-21 Thread Felix Cheung
Not currently - how are you planning to use the output from word2vec? From: Radhwane Chebaane Sent: Thursday, April 20, 2017 4:30:14 AM To: user@spark.apache.org Subject: [sparkR] [MLlib] : Is word2vec implemented in SparkR MLlib ? Hi,

Re: With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread Felix Cheung
Awesome! Congrats!! From: holden.ka...@gmail.com on behalf of Holden Karau Sent: Wednesday, July 12, 2017 12:26:00 PM To: user@spark.apache.org Subject: With 2.2.0 PySpark is now available for pip install from PyPI

Re: how to create List in pyspark

2017-04-28 Thread Felix Cheung
Why no use sql functions explode and split? Would perform and be more stable then udf From: Yanbo Liang Sent: Thursday, April 27, 2017 7:34:54 AM To: Selvam Raman Cc: user Subject: Re: how to create List in pyspark ​You can try with UDF, like

Re: "java.lang.IllegalStateException: There is no space for new record" in GraphFrames

2017-04-28 Thread Felix Cheung
Can you allocate more memory to the executor? Also please open issue with gf on its github From: rok Sent: Friday, April 28, 2017 1:42:33 AM To: user@spark.apache.org Subject: "java.lang.IllegalStateException: There is no space for new

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread Felix Cheung
And perhaps the error message can be improved here? From: Tathagata Das Sent: Monday, June 19, 2017 8:24:01 PM To: kaniska Mandal Cc: Burak Yavuz; user Subject: Re: How save streaming aggregations on 'Structured Streams' in parquet

Re: problem initiating spark context with pyspark

2017-06-10 Thread Felix Cheung
Curtis, assuming you are running a somewhat recent windows version you would not have access to c:\tmp, in your command example winutils.exe ls -F C:\tmp\hive Try changing the path to under your user directory. Running Spark on Windows should work :) From:

Re: graphframes on cluster

2017-09-20 Thread Felix Cheung
Could you include the code where it fails? Generally the best way to use gf is to use the --packages options with spark-submit command From: Imran Rajjad Sent: Wednesday, September 20, 2017 5:47:27 AM To: user @spark Subject: graphframes on

Re: How to convert Row to JSON in Java?

2017-09-09 Thread Felix Cheung
toJSON on Dataset/DataFrame? From: kant kodali Sent: Saturday, September 9, 2017 4:15:49 PM To: user @spark Subject: How to convert Row to JSON in Java? Hi All, How to convert Row to JSON in Java? It would be nice to have .toJson() method

Re: Queries with streaming sources must be executed with writeStream.start()

2017-09-09 Thread Felix Cheung
What is newDS? If it is a Streaming Dataset/DataFrame (since you have writeStream there) then there seems to be an issue preventing toJSON to work. From: kant kodali Sent: Saturday, September 9, 2017 4:04:33 PM To: user @spark Subject:

Re: using R with Spark

2017-09-24 Thread Felix Cheung
Both are free to use; you can use sparklyr from the R shell without RStudio (but you probably want an IDE) From: Adaryl Wakefield Sent: Sunday, September 24, 2017 11:19:24 AM To: user@spark.apache.org Subject: using R with Spark

Re: using R with Spark

2017-09-24 Thread Felix Cheung
If you google it you will find posts or info on how to connect it to different cloud and hadoop/spark vendors. From: Georg Heiler <georg.kf.hei...@gmail.com> Sent: Sunday, September 24, 2017 1:39:09 PM To: Felix Cheung; Adaryl Wakefield; user@spark.apac

Re: using R with Spark

2017-09-24 Thread Felix Cheung
et.net/> www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData<http://twitter.com/BobLovesData> From: Georg Heiler [mailto:georg.kf.hei...@gmail.com] Sent: Sunday, September 24, 2017 3:39 PM To: Felix Cheung <felixcheun...@hot

Re: sparkR 3rd library

2017-09-04 Thread Felix Cheung
Can you include the code you call spark.lapply? From: patcharee Sent: Sunday, September 3, 2017 11:46:40 PM To: spar >> user@spark.apache.org Subject: sparkR 3rd library Hi, I am using spark.lapply to execute an existing R script in

Re: [Spark R]: dapply only works for very small datasets

2017-11-27 Thread Felix Cheung
What's the number of executor and/or number of partitions you are working with? I'm afraid most of the problem is with the serialization deserialization overhead between JVM and R... From: Kunft, Andreas Sent: Monday, November 27,

Re: [Spark R]: dapply only works for very small datasets

2017-11-28 Thread Felix Cheung
; Sent: Tuesday, November 28, 2017 3:11 AM Subject: AW: [Spark R]: dapply only works for very small datasets To: Felix Cheung <felixcheun...@hotmail.com>, <user@spark.apache.org> Thanks for the fast reply. I tried it locally, with 1 - 8 slots on a 8 core machine w/ 25GB memory as w

Re: all calculations finished, but "VCores Used" value remains at its max

2018-05-01 Thread Felix Cheung
Zeppelin keeps the Spark job alive. This is likely a better question for the Zeppelin project. From: Valery Khamenya Sent: Tuesday, May 1, 2018 4:30:24 AM To: user@spark.apache.org Subject: all calculations finished, but "VCores Used" value

Re: Passing an array of more than 22 elements in a UDF

2017-12-24 Thread Felix Cheung
Or use it with Scala 2.11? From: ayan guha Sent: Friday, December 22, 2017 3:15:14 AM To: Aakash Basu Cc: user Subject: Re: Passing an array of more than 22 elements in a UDF Hi I think you are in correct track. You can stuff all your param

Re: Is Apache Spark-2.2.1 compatible with Hadoop-3.0.0

2018-01-08 Thread Felix Cheung
And Hadoop-3.x is not part of the release and sign off for 2.2.1. Maybe we could update the website to avoid any confusion with "later". From: Josh Rosen Sent: Monday, January 8, 2018 10:17:14 AM To: akshay naidu Cc: Saisai Shao; Raj

  1   2   >