Re: Profiling PySpark Pandas UDF

2022-08-25 Thread Takuya UESHIN
ion, it runs faster but when >>>> executed in Spark environment - the processing time is more than expected. >>>> We have one column where the value is large (BinaryType -> 600KB), >>>> wondering whether this could make the Arrow computation slower ? >>>> >>>> Is there any profiling or best way to debug the cost incurred using >>>> pandas UDF ? >>>> >>>> >>>> Thanks, >>>> Subash >>>> >>>> -- > > Thanks, > Russell Jurney @rjurney <http://twitter.com/rjurney> > russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB > <http://facebook.com/jurney> datasyndrome.com > -- Takuya UESHIN

Re: [DISCUSS] Remove sorting of fields in PySpark SQL Row construction

2019-11-07 Thread Takuya UESHIN
ache/spark/pull/20280 > >>> [2] https://www.python.org/dev/peps/pep-0468/ > >>> [3] https://issues.apache.org/jira/browse/SPARK-29748 > > > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: array_contains in package org.apache.spark.sql.functions

2018-06-14 Thread Takuya UESHIN
expr, Literal(value)) > } > } > > > It does a pattern matching to detect if value is of type Column. If yes, > it will use the .expr of the column, otherwise it will work as it used to. > > Any suggestion or opinion on the proposition? > > > Kind regards, > Chongguang LIU > > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: How to insert complex types like mapstring,mapstring,int in spark sql

2014-11-26 Thread Takuya UESHIN
: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: SparkSQL - Language Integrated query - OR clause and IN clause

2014-07-10 Thread Takuya UESHIN
.nabble.com/SparkSQL-Language-Integrated-query-OR-clause-and-IN-clause-tp9298.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: Spark SQL user defined functions

2014-07-04 Thread Takuya UESHIN
correct me if I'm wrong). I would love to be able to do something like the following: val casRdd = sparkCtx.cassandraTable(ks, cf) // registerAsTable etc val res = sql(SELECT id, xmlGetTag(xmlfield, 'sometag') FROM cf) -- Best regards, Martin Gammelsæter -- Takuya UESHIN Tokyo, Japan

Re: Spark SQL user defined functions

2014-07-04 Thread Takuya UESHIN
Gammelsæter martingammelsae...@gmail.com : Takuya, thanks for your reply :) I am already doing that, and it is working well. My question is, can I define arbitrary functions to be used in these queries? On Fri, Jul 4, 2014 at 11:12 AM, Takuya UESHIN ues...@happy-camper.st wrote: Hi

Re: Spark SQL - groupby

2014-07-03 Thread Takuya UESHIN
error is thrown val queryResult = sql(select * from Table) queryResult.groupBy('colA)('colA,Sum('colB) as 'totB).aggregate(Sum('totB)).collect().foreach(println) Thanks subacini -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin