To add a little bit more context, some pros/cons I can think of are:

Option 1: Very easy for users to find the function, since they are all in
org.apache.spark.sql.functions. However, there will be quite a large number
of them.

Option 2: I can't tell why we would want this one over Option 3, since it
has all the problems of Option 3, and not as nice of a hierarchy.

Option 3: Opposite of Option 1. Each "package" or static class has a small
number of functions that are relevant to each other, but for some functions
it is unclear where they should go (e.g. should "min" go into basic or
math?)




On Wed, Apr 29, 2015 at 3:21 PM, Reynold Xin <r...@databricks.com> wrote:

> Before we make DataFrame non-alpha, it would be great to decide how we
> want to namespace all the functions. There are 3 alternatives:
>
> 1. Put all in org.apache.spark.sql.functions. This is how SQL does it,
> since SQL doesn't have namespaces. I estimate eventually we will have ~ 200
> functions.
>
> 2. Have explicit namespaces, which is what master branch currently looks
> like:
>
> - org.apache.spark.sql.functions
> - org.apache.spark.sql.mathfunctions
> - ...
>
> 3. Have explicit namespaces, but restructure them slightly so everything
> is under functions.
>
> package object functions {
>
>   // all the old functions here -- but deprecated so we keep source
> compatibility
>   def ...
> }
>
> package org.apache.spark.sql.functions
>
> object mathFunc {
>   ...
> }
>
> object basicFuncs {
>   ...
> }
>
>
>

Reply via email to