After talking with people on this thread and offline, I've decided to go with option 1, i.e. putting everything in a single "functions" object.
On Thu, Apr 30, 2015 at 10:04 AM, Ted Yu <yuzhih...@gmail.com> wrote: > IMHO I would go with choice #1 > > Cheers > > On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin <r...@databricks.com> wrote: > >> We definitely still have the name collision problem in SQL. >> >> On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal < >> punya.bis...@gmail.com >> > wrote: >> >> > Do we still have to keep the names of the functions distinct to avoid >> > collisions in SQL? Or is there a plan to allow "importing" a namespace >> into >> > SQL somehow? >> > >> > I ask because if we have to keep worrying about name collisions then I'm >> > not sure what the added complexity of #2 and #3 buys us. >> > >> > Punya >> > >> > On Wed, Apr 29, 2015 at 3:52 PM Reynold Xin <r...@databricks.com> >> wrote: >> > >> >> Scaladoc isn't much of a problem because scaladocs are grouped. >> >> Java/Python >> >> is the main problem ... >> >> >> >> See >> >> >> >> >> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ >> >> >> >> On Wed, Apr 29, 2015 at 3:38 PM, Shivaram Venkataraman < >> >> shiva...@eecs.berkeley.edu> wrote: >> >> >> >> > My feeling is that we should have a handful of namespaces (say 4 or >> 5). >> >> It >> >> > becomes too cumbersome to import / remember more package names and >> >> having >> >> > everything in one package makes it hard to read scaladoc etc. >> >> > >> >> > Thanks >> >> > Shivaram >> >> > >> >> > On Wed, Apr 29, 2015 at 3:30 PM, Reynold Xin <r...@databricks.com> >> >> wrote: >> >> > >> >> >> To add a little bit more context, some pros/cons I can think of are: >> >> >> >> >> >> Option 1: Very easy for users to find the function, since they are >> all >> >> in >> >> >> org.apache.spark.sql.functions. However, there will be quite a large >> >> >> number >> >> >> of them. >> >> >> >> >> >> Option 2: I can't tell why we would want this one over Option 3, >> since >> >> it >> >> >> has all the problems of Option 3, and not as nice of a hierarchy. >> >> >> >> >> >> Option 3: Opposite of Option 1. Each "package" or static class has a >> >> small >> >> >> number of functions that are relevant to each other, but for some >> >> >> functions >> >> >> it is unclear where they should go (e.g. should "min" go into basic >> or >> >> >> math?) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Wed, Apr 29, 2015 at 3:21 PM, Reynold Xin <r...@databricks.com> >> >> wrote: >> >> >> >> >> >> > Before we make DataFrame non-alpha, it would be great to decide >> how >> >> we >> >> >> > want to namespace all the functions. There are 3 alternatives: >> >> >> > >> >> >> > 1. Put all in org.apache.spark.sql.functions. This is how SQL does >> >> it, >> >> >> > since SQL doesn't have namespaces. I estimate eventually we will >> >> have ~ >> >> >> 200 >> >> >> > functions. >> >> >> > >> >> >> > 2. Have explicit namespaces, which is what master branch currently >> >> looks >> >> >> > like: >> >> >> > >> >> >> > - org.apache.spark.sql.functions >> >> >> > - org.apache.spark.sql.mathfunctions >> >> >> > - ... >> >> >> > >> >> >> > 3. Have explicit namespaces, but restructure them slightly so >> >> everything >> >> >> > is under functions. >> >> >> > >> >> >> > package object functions { >> >> >> > >> >> >> > // all the old functions here -- but deprecated so we keep >> source >> >> >> > compatibility >> >> >> > def ... >> >> >> > } >> >> >> > >> >> >> > package org.apache.spark.sql.functions >> >> >> > >> >> >> > object mathFunc { >> >> >> > ... >> >> >> > } >> >> >> > >> >> >> > object basicFuncs { >> >> >> > ... >> >> >> > } >> >> >> > >> >> >> > >> >> >> > >> >> >> >> >> > >> >> > >> >> >> > >> > >