Re:[Spark SQL]: SQL, Python, Scala and R API Consistency

大啊 Sun, 31 Jan 2021 23:07:07 -0800

+1 for this work!  But I still don't know how to distinguish common and 
uncommon functions.
It seems that we should decide case by case. This work will cause some confuse.


At 2021-01-29 04:23:08, "MrPowers" <matthewkevinpow...@gmail.com> wrote:
>Thank you all for your amazing work on this project.  Spark has a great
>public interface and the source code is clean.  The core team has done a
>great job building and maintaining this project.  My emails / GitHub
>comments focus on the 1% that we might be able to improve.
>
>Pull requests / suggestions for improvements can come across as negative,
>but I'm nothing but happy & positive about this project.  The source code is
>delightful to read and the internal abstractions are beautiful.
>
>*API consistency*
>
>The SQL, Scala, and Python APIs are generally consistent.  They all have a
>reverse function for example.
>
>Some of the new PRs have arguments against consistent rollout of functions
>across the APIs.  This seems like a break in the traditional Spark
>development process when functions were implemented in all APIs (except for
>functions that only make sense for certain APIs like createDataset and
>toDS).
>
>The default has shifted from consistent application of function across APIs
>to "case by case determination".
>
>*Examples*
>
>* The regexp_extract_all function was recently added to the SQL API.  It was
>then added to the Scala API,  but then removed from the Scala API
><https://github.com/apache/spark/pull/31346>  .
>
>* There is an ongoing discussion on  if CalendarType will be added to the
>Python API <https://github.com/apache/spark/pull/29935>  
>
>*Arguments against adding functions like regexp_extract_all to the Scala
>API:*
>
>* Some of these functions are SQL specific and don't make sense for the
>other languages
>
>* Scala users can access the SQL functions via expr
>
>*Argument rebuttal*
>
>I don't understand the "some of the functions are SQL specific argument". 
>regexp_extract_all fills a gap in the API.  Users have been forced to use
>UDF workarounds for this in the past.  Users from all APIs need this
>solution.  
>
>Using expr isn't developer friendly.  Scala / Python users don't want to
>manipulate SQL strings.  Nesting functions in SQL strings is complicated. 
>The quoting and escaping is all different.  Figuring out how to invoke
>regexp_replace(col("word1"), "//", "\\,") via expr would be a real pain -
>would need to figure out SQL quoting, SQL escaping, and how to access column
>names instead of a column object.
>
>Any of the org.apache.spark.sql.functions can be invoked via expr.  The core
>reason the Scala/Python APIs exist is so that developers don't need to
>manipulate strings for expr.
>
>regexp_extract_all should be added to the Scala API for the same reasons
>that regexp_extract was added to the Scala API.  
>
>*Next steps*
>
>* I'd like to better understand why we've broken from the traditional Spark
>development process of "consistently implementing functions across all APIs"
>to "selectively implementing functions in certain APIs"
>
>* Hopefully shift the burden of proof to those in favor of inconsistent
>application.  Consistent application should be the default.  
>
>Thank you all for your excellent work on this project.
>
>- Matthew Powers (GitHub: MrPowers)
>
>
>
>--
>Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
>---------------------------------------------------------------------
>To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re:[Spark SQL]: SQL, Python, Scala and R API Consistency

Reply via email to