To add a little bit more context, some pros/cons I can think of are: Option 1: Very easy for users to find the function, since they are all in org.apache.spark.sql.functions. However, there will be quite a large number of them.
Option 2: I can't tell why we would want this one over Option 3, since it has all the problems of Option 3, and not as nice of a hierarchy. Option 3: Opposite of Option 1. Each "package" or static class has a small number of functions that are relevant to each other, but for some functions it is unclear where they should go (e.g. should "min" go into basic or math?) On Wed, Apr 29, 2015 at 3:21 PM, Reynold Xin <r...@databricks.com> wrote: > Before we make DataFrame non-alpha, it would be great to decide how we > want to namespace all the functions. There are 3 alternatives: > > 1. Put all in org.apache.spark.sql.functions. This is how SQL does it, > since SQL doesn't have namespaces. I estimate eventually we will have ~ 200 > functions. > > 2. Have explicit namespaces, which is what master branch currently looks > like: > > - org.apache.spark.sql.functions > - org.apache.spark.sql.mathfunctions > - ... > > 3. Have explicit namespaces, but restructure them slightly so everything > is under functions. > > package object functions { > > // all the old functions here -- but deprecated so we keep source > compatibility > def ... > } > > package org.apache.spark.sql.functions > > object mathFunc { > ... > } > > object basicFuncs { > ... > } > > >