+1 for this work! But I still don't know how to distinguish common and uncommon functions. It seems that we should decide case by case. This work will cause some confuse.
At 2021-01-29 04:23:08, "MrPowers" <matthewkevinpow...@gmail.com> wrote: >Thank you all for your amazing work on this project. Spark has a great >public interface and the source code is clean. The core team has done a >great job building and maintaining this project. My emails / GitHub >comments focus on the 1% that we might be able to improve. > >Pull requests / suggestions for improvements can come across as negative, >but I'm nothing but happy & positive about this project. The source code is >delightful to read and the internal abstractions are beautiful. > >*API consistency* > >The SQL, Scala, and Python APIs are generally consistent. They all have a >reverse function for example. > >Some of the new PRs have arguments against consistent rollout of functions >across the APIs. This seems like a break in the traditional Spark >development process when functions were implemented in all APIs (except for >functions that only make sense for certain APIs like createDataset and >toDS). > >The default has shifted from consistent application of function across APIs >to "case by case determination". > >*Examples* > >* The regexp_extract_all function was recently added to the SQL API. It was >then added to the Scala API, but then removed from the Scala API ><https://github.com/apache/spark/pull/31346> . > >* There is an ongoing discussion on if CalendarType will be added to the >Python API <https://github.com/apache/spark/pull/29935> > >*Arguments against adding functions like regexp_extract_all to the Scala >API:* > >* Some of these functions are SQL specific and don't make sense for the >other languages > >* Scala users can access the SQL functions via expr > >*Argument rebuttal* > >I don't understand the "some of the functions are SQL specific argument". >regexp_extract_all fills a gap in the API. Users have been forced to use >UDF workarounds for this in the past. Users from all APIs need this >solution. > >Using expr isn't developer friendly. Scala / Python users don't want to >manipulate SQL strings. Nesting functions in SQL strings is complicated. >The quoting and escaping is all different. Figuring out how to invoke >regexp_replace(col("word1"), "//", "\\,") via expr would be a real pain - >would need to figure out SQL quoting, SQL escaping, and how to access column >names instead of a column object. > >Any of the org.apache.spark.sql.functions can be invoked via expr. The core >reason the Scala/Python APIs exist is so that developers don't need to >manipulate strings for expr. > >regexp_extract_all should be added to the Scala API for the same reasons >that regexp_extract was added to the Scala API. > >*Next steps* > >* I'd like to better understand why we've broken from the traditional Spark >development process of "consistently implementing functions across all APIs" >to "selectively implementing functions in certain APIs" > >* Hopefully shift the burden of proof to those in favor of inconsistent >application. Consistent application should be the default. > >Thank you all for your excellent work on this project. > >- Matthew Powers (GitHub: MrPowers) > > > >-- >Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > >--------------------------------------------------------------------- >To unsubscribe e-mail: dev-unsubscr...@spark.apache.org