Re: Should python-2 be supported in Spark 3.0?

2019-05-29 Thread Jules Damji
Here’s the tweet from the horse’s mouth: https://twitter.com/gvanrossum/status/1133496146700058626?s=21 Cheers Jules — Sent from my iPhone Pardon the dumb thumb typos :) > On May 29, 2019, at 10:12 PM, Sean Owen wrote: > > Deprecated -- certainly and sooner than later. > I don't have a

Re: [EXT] handling skewness issues

2019-04-29 Thread Jules Damji
Yes, indeed! A few talks in the developer and deep dives address the data skews issue and how to address them. I shall let the group know when the talk sessions are available. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Apr 29, 2019, at 2:13 PM, Michael Mansour >

Re: Standardized Join Types for DataFrames

2019-02-22 Thread Jules Damji
Also, Holden Karau conducts PR requests reviews and shows how you can contribute to this communal project. Attend one of her live PR sessions. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Feb 22, 2019, at 7:16 AM, Pooja Agrawal wrote: > > Hi, > > I am new to spark

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Jules Damji
Indeed! Sent from my iPhone Pardon the dumb thumb typos :) > On Nov 8, 2018, at 11:31 AM, Dongjoon Hyun wrote: > > Finally, thank you all. Especially, thanks to the release manager, Wenchen! > > Bests, > Dongjoon. > > >> On Thu, Nov 8, 2018 at 11:24 AM Wenchen Fan wrote: >> + user list >>

Re: Create an Empty dataframe

2018-06-30 Thread Jules Damji
This is one dirty, quick way to do it: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/8599738367597028/321083416305398/3601578643761083/latest.html Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Jun 30, 2018, at 7:46 AM,

Re: PySpark API on top of Apache Arrow

2018-05-26 Thread Jules Damji
Actually, we do mention that Pandas UDF is built upon Apache Arrow.. :-) And point to the blog by their contributors from Two Sigma. :-) “On the other hand, Pandas UDF built atop Apache Arrow accords high-performance to Python developers, whether you use Pandas UDFs on a single-node machine or

Re: Calling Pyspark functions in parallel

2018-03-19 Thread Jules Damji
What’s your PySpark function? Is it a UDF? If so consider using pandas UDF introduced in Spark 2.3. More info here: https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html Sent from my iPhone Pardon the dumb thumb typos :) > On Mar 18, 2018, at 10:54 PM,

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

2018-01-06 Thread Jules Damji
Here’s are couple tutorial that shows how to extract Structured nested data https://databricks.com/blog/2017/06/27/4-sql-high-order-lambda-functions-examine-complex-structured-data-databricks.html

Spark + AI Summit CfP Open

2017-12-09 Thread Jules Damji
Fellow Sparkers! The CfP for the renamed and expanded summit is open now. If you’ve an idea and implementation of that idea you want to share with the Apache Spark and AI community, please do so now. https://databricks.com/blog/2017/12/06/spark-summit-is-becoming-the-spark-ai-summit.html

Re: Spark 2.2 Structured Streaming + Kinesis

2017-11-13 Thread Jules Damji
You can use the Databricks to connect to Kinesis: https://databricks.com/blog/2017/08/09/apache-sparks-structured-streaming-with-amazon-kinesis-on-databricks.html Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Nov 13, 2017, at 3:15 PM, Benjamin Kim

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

2017-10-07 Thread Jules Damji
You might find these blogs helpful to parse & extract data from complex structures: https://databricks.com/blog/2017/06/27/4-sql-high-order-lambda-functions-examine-complex-structured-data-databricks.html

Re: using R with Spark

2017-09-24 Thread Jules Damji
You can also you sparkly on Databricks. https://databricks.com/blog/2017/05/25/using-sparklyr-databricks.html Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Sep 24, 2017, at 1:24 PM, Felix Cheung wrote: > > Both are free to use; you can use

Re: How does spark work?

2017-09-12 Thread Jules Damji
Alternatively, watch Spark Summit talk on Memory Management to get insight from a developer's perspective. https://spark-summit.org/2016/events/deep-dive-apache-spark-memory-management/ https://spark-summit.org/2017/events/a-developers-view-into-sparks-memory-model/ Cheers Jules Sent from

[Upvote] for Apache Spark for 2017 Innovation Award

2017-08-29 Thread Jules Damji
Fellow Spark users, If you think, and believe, deep in your hearts that Apache Spark deserves an innovation award, cast your vote here: https://jaxlondon.com/jax-awards Cheers, Jules Sent from my iPhone Pardon the dumb thumb typos :)

Re: Does Spark SQL uses Calcite?

2017-08-19 Thread Jules Damji
l.com> wrote: >>>>> the thrift server is a jdbc server, Kanth >>>>> >>>>>> On Fri, Aug 11, 2017 at 2:51 PM, <kanth...@gmail.com> wrote: >>>>>> I also wonder why there isn't a jdbc connector for spark sql? >>>>>> >>>

Re: SQL specific documentation for recent Spark releases

2017-08-10 Thread Jules Damji
I refer to docs.databricks.com/Spark/latest/Spark-sql/index.html. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Aug 10, 2017, at 1:46 PM, Stephen Boesch wrote: > > > While the DataFrame/DataSets are useful in many circumstances they are >

Re: Does Spark SQL uses Calcite?

2017-08-10 Thread Jules Damji
Yes, it's more used in Hive than Spark Sent from my iPhone Pardon the dumb thumb typos :) > On Aug 10, 2017, at 2:24 PM, Sathish Kumaran Vairavelu > wrote: > > I think it is for hive dependency. >> On Thu, Aug 10, 2017 at 4:14 PM kant kodali

Re: Do we anything for Deep Learning in Spark?

2017-06-20 Thread Jules Damji
And we will having a webinar on July 27 going into some more details. Stay tuned. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Jun 20, 2017, at 7:00 AM, Michael Mior wrote: > > It's still in the early stages, but check out Deep Learning

Re: How to convert Dataset to Dataset in Spark Structured Streaming?

2017-05-31 Thread Jules Damji
Hello Kant, See is the examples in this blog explains how to deal with your particular case: https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On May 30, 2017, at

10th Spark Summit 2017 at Moscone Center

2017-04-26 Thread Jules Damji
Fellow Spark users, The Spark Summit Program Committee requested that I share with this Spark user group few sessions and events they have added this year: Hackathon 1-day and 2-day training courses 3 new tracks: Technical Deep Dive, Streaming and Machine Learning and more… If you planing to

Spark Summit CfP Closes Sunday

2016-09-28 Thread Jules Damji
Fellow Sparkers, The Spark Summit East 2017 CfP closes Sunday. If you have an abstract, don’t miss the deadline https://spark-summit.org/east-2017/ <https://spark-summit.org/east-2017/> Thank you & see you in Boston! cheers Jules -- Simplicity precludes neither profundity nor pow

The 8th and the Largest Spark Summit is less than 8 weeks away!

2016-09-11 Thread Jules Damji
Fellow Sparkers!With every Spark Summit, an Apache Spark Community event, increasing numbers of users and developers attend. This is the eighth Summit in one of my best cosmopolitan cities in the European Union, Brussels.We are offering a special promo code* for all Apache Spark users and

Friendly Reminder: Spark Summit EU CfP Deadline July 1, 2016

2016-06-29 Thread Jules Damji
Hello All, If you haven't submitted a CfP for Spark Summit EU, the deadline is this Friday, July 1st. Submit at https://spark-summit.org/eu-2016/ Cheers! Jules Spark Community Evangelist Databricks, Inc. Sent from my iPhone Pardon the dumb thumb typos :)

Databricks' 2016 Survey on Apache Spark

2016-06-23 Thread Jules Damji
Hi All, We at Databricks are running a short survey to understand users’ needs and usage of Apache Spark. Because we value community feedback, this survey will help us both to understand usage of Spark and to direct our future contributions to it. If you have a moment, please take some time

CfP for Spark Summit Brussels, 2016

2016-06-18 Thread Jules Damji
Hello All, Just in case you missed, Spark Summit is returning to Europe, October 25-27, 2016, and the Call for Presentations is open. Submit your Cfp before July 1 https://spark-summit.org/eu-2016/ Cheers, Jules Community Evangelist Databricks, Inc Sent from my iPhone Pardon the dumb

Re: GraphX Java API

2016-05-29 Thread Jules Damji
Also, this blog talks about GraphsFrames implementation of some GraphX algorithms, accessible from Java, Scala, and Python https://databricks.com/blog/2016/03/03/introducing-graphframes.html Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On May 29, 2016, at 12:24 AM,

Re: Spark 2.0 forthcoming features

2016-04-21 Thread Jules Damji
Thanks Michael, we're doing a Spark 2.0 webinar. Register and if you can't make it; you can always watch the recording. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Apr 20, 2016, at 10:15 AM, Michael Malak > wrote: > >

Re: shuffle in spark

2016-03-14 Thread Jules Damji
Hello Ashok, I found three sources of how shuffle works (and what transformations trigger it) instructive and illuminative. After learning from it, you should be able to extrapolate how your particular and practical use case would work.

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-01 Thread Jules Damji
Hello Joshua, comments are inline... > On Mar 1, 2016, at 5:03 AM, Joshua Sorrell wrote: > > I haven't used Spark in the last year and a half. I am about to start a > project with a new team, and we need to decide whether to use pyspark or > Scala. Indeed, good questions,

Re: a basic question on first use of PySpark shell and example, which is failing

2016-02-28 Thread Jules Damji
Hello Ronald, Since you have placed the file under HDFS, you might same change the path name to: val lines = sc.textFile("hdfs://user/taylor/Spark/Warehouse.java") Sent from my iPhone Pardon the dumb thumb typos :) > On Feb 28, 2016, at 9:36 PM, Taylor, Ronald C

Re: Recommendation for a good book on Spark, beginner to moderate knowledge

2016-02-28 Thread Jules Damji
fire up the shell. As for the “pyspark” and “spark-shell”, they both come with the Spark installation and are in $spark_install_dir/bin directory. Have a go at them. Best way to learn the language. Cheers Jules -- “Language is the palate from which we draw all colors of our life.” Jules Da

Re: Recommendation for a good book on Spark, beginner to moderate knowledge

2016-02-28 Thread Jules Damji
Hello Ashoka, "Learning Spark," from O'Reilly, is certainly a good start, and all basic video tutorials from Spark Summit Training, "Spark Essentials", are excellent supplementary materials. And the best (and most effective) way to teach yourself is really firing up the spark-shell or pyspark