Re: py4j.protocol.Py4JJavaError: An error occurred while calling o794.parquet

2018-01-10 Thread Felix Cheung
java.nio.BufferUnderflowException Can you try reading the same data in Scala? From: Liana Napalkova Sent: Wednesday, January 10, 2018 12:04:00 PM To: Timur Shenkao Cc: user@spark.apache.org Subject: Re: py4j.protocol.Py4JJavaError:

Re: SparkR test script issue: unable to run run-tests.h on spark 2.2

2018-02-14 Thread Felix Cheung
Yes it is issue with the newer release of testthat. To workaround could you install an earlier version with devtools? will follow up for a fix. _ From: Hyukjin Kwon Sent: Wednesday, February 14, 2018 6:49 PM Subject: Re: SparkR test script

Re: [graphframes]how Graphframes Deal With Bidirectional Relationships

2018-02-19 Thread Felix Cheung
Generally that would be the approach. But since you have effectively double the number of edges this will likely affect the scale your job will run. From: xiaobo Sent: Monday, February 19, 2018 3:22:02 AM To: user@spark.apache.org Subject:

Re: [graphframes]how Graphframes Deal With BidirectionalRelationships

2018-02-20 Thread Felix Cheung
No it does not support bi directional edges as of now. _ From: xiaobo <guxiaobo1...@qq.com> Sent: Tuesday, February 20, 2018 4:35 AM Subject: Re: [graphframes]how Graphframes Deal With BidirectionalRelationships To: Felix Cheung <felixcheun...@hotmail.co

Re: Does Pyspark Support Graphx?

2018-02-18 Thread Felix Cheung
Hi - I’m maintaining it. As of now there is an issue with 2.2 that breaks personalized page rank, and that’s largely the reason there isn’t a release for 2.2 support. There are attempts to address this issue - if you are interested we would love for your help.

Re: Passing an array of more than 22 elements in a UDF

2017-12-26 Thread Felix Cheung
7 9:13 PM Subject: Re: Passing an array of more than 22 elements in a UDF To: Felix Cheung <felixcheun...@hotmail.com> Cc: ayan guha <guha.a...@gmail.com>, user <user@spark.apache.org> What's the privilege of using that specific version for this? Please throw some light onto i

Re: Spark 2.2.1 worker invocation

2017-12-26 Thread Felix Cheung
I think you are looking for spark.executor.extraJavaOptions https://spark.apache.org/docs/latest/configuration.html#runtime-environment From: Christopher Piggott Sent: Tuesday, December 26, 2017 8:00:56 AM To: user@spark.apache.org Subject:

Re: Spark 2.3.1 not working on Java 10

2018-06-21 Thread Felix Cheung
I'm not sure we have completed support for Java 10 From: Rahul Agrawal Sent: Thursday, June 21, 2018 7:22:42 AM To: user@spark.apache.org Subject: Spark 2.3.1 not working on Java 10 Dear Team, I have installed Java 10, Scala 2.12.6 and spark 2.3.1 in my

Re: How to start practicing Python Spark Streaming in Linux?

2018-03-14 Thread Felix Cheung
It’s best to start with Structured Streaming https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#tab_python_0 https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#tab_python_0 _ From: Aakash Basu

Re: [Structured Streaming Query] Calculate Running Avg from Kafka feed using SQL query

2018-04-06 Thread Felix Cheung
Instead of write to console you need to write to memory for it to be queryable .format("memory") .queryName("tableName") https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks From: Aakash Basu

Re: Problem running Kubernetes example v2.2.0-kubernetes-0.5.0

2018-04-22 Thread Felix Cheung
You might want to check with the spark-on-k8s Or try using kubernetes from the official spark 2.3.0 release. (Yes we don't have an official docker image though but you can build with the script) From: Rico Bergmann Sent: Wednesday, April

Re: [Spark R]: Linear Mixed-Effects Models in Spark R

2018-03-26 Thread Felix Cheung
If your data can be split into groups and you can call into your favorite R package on each group of data (in parallel): https://spark.apache.org/docs/latest/sparkr.html#run-a-given-function-on-a-large-dataset-grouping-by-input-columns-and-using-gapply-or-gapplycollect

Re: Custom metrics sink

2018-03-16 Thread Felix Cheung
There is a proposal to expose them. See SPARK-14151 From: Christopher Piggott Sent: Friday, March 16, 2018 1:09:38 PM To: user@spark.apache.org Subject: Custom metrics sink Just for fun, i want to make a stupid program that makes different

Re: Question on Spark-kubernetes integration

2018-03-02 Thread Felix Cheung
That's in the plan. We should be sharing a bit more about the roadmap in future releases shortly. In the mean time this is in the official documentation on what is coming: https://spark.apache.org/docs/latest/running-on-kubernetes.html#future-work This supports started as a fork of the Apache

Re: Question on Spark-kubernetes integration

2018-03-02 Thread Felix Cheung
For pyspark specifically IMO should be very high on the list to port back... As for roadmap - should be sharing more soon. From: lucas.g...@gmail.com <lucas.g...@gmail.com> Sent: Friday, March 2, 2018 9:41:46 PM To: user@spark.apache.org Cc: Felix Cheung S

Re: Spark on K8s - using files fetched by init-container?

2018-02-27 Thread Felix Cheung
Yes you were pointing to HDFS on a loopback address... From: Jenna Hoole Sent: Monday, February 26, 2018 1:11:35 PM To: Yinan Li; user@spark.apache.org Subject: Re: Spark on K8s - using files fetched by init-container? Oh, duh. I

Re: SparkR issue

2018-10-14 Thread Felix Cheung
1 seems like its spending a lot of time in R (slicing the data I guess?) and not with Spark 2 could you write it into a csv file locally and then read it from Spark? From: ayan guha Sent: Monday, October 8, 2018 11:21 PM To: user Subject: SparkR issue Hi We

Re: can Spark 2.4 work on JDK 11?

2018-09-29 Thread Felix Cheung
Not officially. We have seen problem with JDK 10 as well. It will be great if you or someone would like to contribute to get it to work.. From: kant kodali Sent: Tuesday, September 25, 2018 2:31 PM To: user @spark Subject: can Spark 2.4 work on JDK 11? Hi All,

Re: spark.lapply

2018-09-26 Thread Felix Cheung
It looks like the native R process is terminated from buffer overflow. Do you know how much data is involved? From: Junior Alvarez Sent: Wednesday, September 26, 2018 7:33 AM To: user@spark.apache.org Subject: spark.lapply Hi! I’m using spark.lapply() in

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-18 Thread Felix Cheung
Not as far as I recall... From: Serega Sheypak Sent: Friday, January 18, 2019 3:21 PM To: user Subject: Spark on Yarn, is it possible to manually blacklist nodes before running spark job? Hi, is there any possibility to tell Scheduler to blacklist specific

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Felix Cheung
You can call coalesce to combine partitions.. From: Shivam Sharma <28shivamsha...@gmail.com> Sent: Saturday, January 19, 2019 7:43 AM To: user@spark.apache.org Subject: Persist Dataframe to HDFS considering HDFS Block Size. Hi All, I wanted to persist dataframe

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Felix Cheung
From: Li Gao Sent: Saturday, January 19, 2019 8:43 AM To: Felix Cheung Cc: Serega Sheypak; user Subject: Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job? on yarn it is impossible afaik. on kubernetes you can use taints

Re: spark2.4 arrow enabled true,error log not returned

2019-01-12 Thread Felix Cheung
Do you mean you run the same code on yarn and standalone? Can you check if they are running the same python versions? From: Bryan Cutler Sent: Thursday, January 10, 2019 5:29 PM To: libinsong1...@gmail.com Cc: zlist Spark Subject: Re: spark2.4 arrow enabled

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Felix Cheung
I don’t think we should remove any API even in a major release without deprecating it first... From: Mark Hamstra Sent: Sunday, September 16, 2018 12:26 PM To: Erik Erlandson Cc: user@spark.apache.org; dev Subject: Re: Should python-2 be supported in Spark 3.0?

Re: I have trained a ML model, now what?

2019-01-22 Thread Felix Cheung
About deployment/serving SPIP https://issues.apache.org/jira/browse/SPARK-26247 From: Riccardo Ferrari Sent: Tuesday, January 22, 2019 8:07 AM To: User Subject: I have trained a ML model, now what? Hi list! I am writing here to here about your experience on

Re: spark.submit.deployMode: cluster

2019-03-28 Thread Felix Cheung
If anyone wants to improve docs please create a PR. lol But seriously you might want to explore other projects that manage job submission on top of spark instead of rolling your own with spark-submit. From: Pat Ferrel Sent: Tuesday, March 26, 2019 2:38 PM

Re: Spark - Hadoop custom filesystem service loading

2019-03-23 Thread Felix Cheung
Hmm thanks. Do you have a proposed solution? From: Jhon Anderson Cardenas Diaz Sent: Monday, March 18, 2019 1:24 PM To: user Subject: Spark - Hadoop custom filesystem service loading Hi everyone, On spark 2.2.0, if you wanted to create a custom file system

Re: Spark-hive integration on HDInsight

2019-02-21 Thread Felix Cheung
You should check with HDInsight support From: Jay Singh Sent: Wednesday, February 20, 2019 11:43:23 PM To: User Subject: Spark-hive integration on HDInsight I am trying to integrate spark with hive on HDInsight spark cluster . I copied hive-site.xml in

Re: SparkR + binary type + how to get value

2019-02-17 Thread Felix Cheung
: Thijs Haarhuis Sent: Thursday, February 14, 2019 4:01 AM To: Felix Cheung; user@spark.apache.org Subject: Re: SparkR + binary type + how to get value Hi Felix, Sure.. I have the following code: printSchema(results) cat("\n\n\n") firstRow <- first(results

Re: SparkR + binary type + how to get value

2019-02-13 Thread Felix Cheung
Please share your code From: Thijs Haarhuis Sent: Wednesday, February 13, 2019 6:09 AM To: user@spark.apache.org Subject: SparkR + binary type + how to get value Hi all, Does anybody have any experience in accessing the data from a column which has a binary

Re: java.lang.IllegalArgumentException: Unsupported class file major version 55

2019-02-10 Thread Felix Cheung
And it might not work completely. Spark only officially supports JDK 8. I’m not sure if JDK 9 and + support is complete? From: Jungtaek Lim Sent: Thursday, February 7, 2019 5:22 AM To: Gabor Somogyi Cc: Hande, Ranjit Dilip (Ranjit); user@spark.apache.org

Re: SparkR + binary type + how to get value

2019-02-19 Thread Felix Cheung
there: From: Thijs Haarhuis Sent: Tuesday, February 19, 2019 5:28 AM To: Felix Cheung; user@spark.apache.org Subject: Re: SparkR + binary type + how to get value Hi Felix, Thanks. I got it working now by using the unlist function. I have another question, maybe you can help me with, since I did

Re: I have trained a ML model, now what?

2019-01-23 Thread Felix Cheung
Please comment in the JIRA/SPIP if you are interested! We can see the community support for a proposal like this. From: Pola Yao Sent: Wednesday, January 23, 2019 8:01 AM To: Riccardo Ferrari Cc: Felix Cheung; User Subject: Re: I have trained a ML model, now

Re: ApacheCon NA 2019 Call For Proposal and help promoting Spark project

2019-04-14 Thread Felix Cheung
And a plug for the Graph Processing track - A discussion of comparison talk between the various Spark options (GraphX, GraphFrames, CAPS), or the ongoing work with SPARK-25994 Property Graphs, Cypher Queries, and Algorithms Would be great! From: Felix Cheung

ApacheCon NA 2019 Call For Proposal and help promoting Spark project

2019-04-13 Thread Felix Cheung
Hi Spark community! As you know ApacheCon NA 2019 is coming this Sept and it’s CFP is now open! This is an important milestone as we celebrate 20 years of ASF. We have tracks like Big Data and Machine Learning among many others. Please submit your talks/thoughts/challenges/learnings here:

Re: sparksql in sparkR?

2019-06-07 Thread Felix Cheung
This seem to be more a question of spark-sql shell? I may suggest you change the email title to get more attention. From: ya Sent: Wednesday, June 5, 2019 11:48:17 PM To: user@spark.apache.org Subject: sparksql in sparkR? Dear list, I am trying to use sparksql

Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Felix Cheung
We don’t usually reference a future release on website > Spark website and state that Python 2 is deprecated in Spark 3.0 I suspect people will then ask when is Spark 3.0 coming out then. Might need to provide some clarity on that. From: Reynold Xin Sent:

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Felix Cheung
. From: shane knapp Sent: Friday, May 31, 2019 7:38:10 PM To: Denny Lee Cc: Holden Karau; Bryan Cutler; Erik Erlandson; Felix Cheung; Mark Hamstra; Matei Zaharia; Reynold Xin; Sean Owen; Wenchen Fen; Xiangrui Meng; dev; user Subject: Re: Should python-2 be supported in Spark 3.0? +1000

Re: Spark SQL in R?

2019-06-08 Thread Felix Cheung
I don’t think you should get a hive-xml from the internet. It should have connection information about a running hive metastore - if you don’t have a hive metastore service as you are running locally (from a laptop?) then you don’t really need it. You can get spark to work with it’s own.

Re: Static partitioning in partitionBy()

2019-05-07 Thread Felix Cheung
You could df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save It could get some data skew problem but might work for you From: Burak Yavuz Sent: Tuesday, May 7, 2019 9:35:10 AM To: Shubham Chaurasia Cc: dev; user@spark.apache.org Subject: Re: Static

Re: [PySpark] [SparkR] Is it possible to invoke a PySpark function with a SparkR DataFrame?

2019-07-16 Thread Felix Cheung
Not currently in Spark. However, there are systems out there that can share DataFrame between languages on top of Spark - it’s not calling the python UDF directly but you can pass the DataFrame to python and then .map(UDF) that way. From: Fiske, Danny Sent:

Re: JDK11 Support in Apache Spark

2019-08-24 Thread Felix Cheung
That’s great! From: ☼ R Nair Sent: Saturday, August 24, 2019 10:57:31 AM To: Dongjoon Hyun Cc: d...@spark.apache.org ; user @spark/'user @spark'/spark users/user@spark Subject: Re: JDK11 Support in Apache Spark Finally!!! Congrats On Sat, Aug 24, 2019, 11:11

Re: SparkR integration with Hive 3 spark-r

2019-11-24 Thread Felix Cheung
I think you will get more answer if you ask without SparkR. You question is independent on SparkR. Spark support for Hive 3.x (3.1.2) was added here https://github.com/apache/spark/commit/1b404b9b9928144e9f527ac7b1caa15f932c2649 You should be able to connect Spark to Hive metastore.

Re: Fail to use SparkR of 3.0 preview 2

2019-12-26 Thread Felix Cheung
Maybe it’s the reverse - the package is built to run in latest but not compatible with slightly older (3.5.2 was Dec 2018) From: Jeff Zhang Sent: Thursday, December 26, 2019 5:36:50 PM To: Felix Cheung Cc: user.spark Subject: Re: Fail to use SparkR of 3.0

Re: Fail to use SparkR of 3.0 preview 2

2019-12-26 Thread Felix Cheung
It looks like a change in the method signature in R base packages. Which version of R are you running on? From: Jeff Zhang Sent: Thursday, December 26, 2019 12:46:12 AM To: user.spark Subject: Fail to use SparkR of 3.0 preview 2 I tried SparkR of spark 3.0

Fwd: Announcing ApacheCon @Home 2020

2020-07-01 Thread Felix Cheung
-- Forwarded message - We are pleased to announce that ApacheCon @Home will be held online, September 29 through October 1. More event details are available at https://apachecon.com/acah2020 but there’s a few things that I want to highlight for you, the members. Yes, the CFP

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Felix Cheung
Congrats From: Jungtaek Lim Sent: Thursday, June 18, 2020 8:18:54 PM To: Hyukjin Kwon Cc: Mridul Muralidharan ; Reynold Xin ; dev ; user Subject: Re: [ANNOUNCE] Apache Spark 3.0.0 Great, thanks all for your efforts on the huge step forward! On Fri, Jun 19,

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-05 Thread Felix Cheung
Congrats and thanks! From: Hyukjin Kwon Sent: Wednesday, March 3, 2021 4:09:23 PM To: Dongjoon Hyun Cc: Gabor Somogyi ; Jungtaek Lim ; angers zhu ; Wenchen Fan ; Kent Yao ; Takeshi Yamamuro ; dev ; user @spark Subject: Re: [ANNOUNCE] Announcing Apache Spark

<    1   2