great, thanks for the additional information Frank
On Wed, Oct 19, 2016 at 1:57 PM, Jarrod Vawdrey <jvawd...@pivotal.io> wrote: > IMO > > 1) Option to define resulting column names. Please see pdltools > implementation - the ability to pass in a function is especially useful ( > http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html) > 2) Option to dummy code only the top n most frequently occurring values in > any column > 3) Option to create numeric column names (E.g. pivotcol_val1, pivotcol_val2 > ...) instead of values in column names + secondary mapping table > 4) Option to exclude original column from results table > > (1) & (2) are much higher priority than (3) & (4). > > Agreed that these could also be applied to Pivoting (especially 1). > > > > Jarrod Vawdrey > Sr. Data Scientist > Data Science & Engineering | Pivotal > (650) 315-8905 > https://pivotal.io/ > > On Wed, Oct 19, 2016 at 4:47 PM, Frank McQuillan <fmcquil...@pivotal.io> > wrote: > > > Thanks for those suggestions, Jarrod. They all sound pretty useful - > > would you mind taking a crack at numbering them 1,2,3... etc, in the > order > > of priority as you see it? > > > > Also it seems like some of these could be applied to the Pivot function > as > > well, e.g., UDF for column naming. > > > > Frank > > > > > > > > On Fri, Oct 14, 2016 at 1:02 PM, Jarrod Vawdrey <jvawd...@pivotal.io> > > wrote: > > > >> Hey Frank, > >> > >> How are special character values handled today? It is often not ideal to > >> end up with column names that require double quotes to call due to > >> downstream scripts. > >> > >> A couple of features that would be useful > >> > >> * Option to define resulting column names. Please see pdltools > >> implementation - the ability to pass in a function is especially useful > ( > >> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html) > >> * Option to dummy code only the top n most frequently occurring values > in > >> any column > >> * Option to exclude original column from results table > >> * Option to create numeric column names (E.g. pivotcol_val1, > >> pivotcol_val2 ...) instead of values in column names + secondary mapping > >> table > >> > >> Thank you > >> > >> Jarrod Vawdrey > >> Sr. Data Scientist > >> Data Science & Engineering | Pivotal > >> (650) 315-8905 > >> https://pivotal.io/ > >> > >> On Fri, Oct 14, 2016 at 3:35 PM, Frank McQuillan <fmcquil...@pivotal.io > > > >> wrote: > >> > >>> For the module encoding categorical variables > >>> http://madlib.incubator.apache.org/docs/latest/group__grp__d > >>> ata__prep.html > >>> does anyone have any suggestions on improvements that we could make? > >>> > >>> Here is a video on how encoding categorical variables works for those > not > >>> familiar with it > >>> https://www.youtube.com/watch?v=zxGgGMGJZRo&index=7&list=PL6 > >>> 2pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ > >>> > >> > >> > > >