My feeling is that we should have a handful of namespaces (say 4 or 5). It becomes too cumbersome to import / remember more package names and having everything in one package makes it hard to read scaladoc etc.
Thanks Shivaram On Wed, Apr 29, 2015 at 3:30 PM, Reynold Xin <r...@databricks.com> wrote: > To add a little bit more context, some pros/cons I can think of are: > > Option 1: Very easy for users to find the function, since they are all in > org.apache.spark.sql.functions. However, there will be quite a large number > of them. > > Option 2: I can't tell why we would want this one over Option 3, since it > has all the problems of Option 3, and not as nice of a hierarchy. > > Option 3: Opposite of Option 1. Each "package" or static class has a small > number of functions that are relevant to each other, but for some functions > it is unclear where they should go (e.g. should "min" go into basic or > math?) > > > > > On Wed, Apr 29, 2015 at 3:21 PM, Reynold Xin <r...@databricks.com> wrote: > > > Before we make DataFrame non-alpha, it would be great to decide how we > > want to namespace all the functions. There are 3 alternatives: > > > > 1. Put all in org.apache.spark.sql.functions. This is how SQL does it, > > since SQL doesn't have namespaces. I estimate eventually we will have ~ > 200 > > functions. > > > > 2. Have explicit namespaces, which is what master branch currently looks > > like: > > > > - org.apache.spark.sql.functions > > - org.apache.spark.sql.mathfunctions > > - ... > > > > 3. Have explicit namespaces, but restructure them slightly so everything > > is under functions. > > > > package object functions { > > > > // all the old functions here -- but deprecated so we keep source > > compatibility > > def ... > > } > > > > package org.apache.spark.sql.functions > > > > object mathFunc { > > ... > > } > > > > object basicFuncs { > > ... > > } > > > > > > >