Hey Frank, How are special character values handled today? It is often not ideal to end up with column names that require double quotes to call due to downstream scripts.
A couple of features that would be useful * Option to define resulting column names. Please see pdltools implementation - the ability to pass in a function is especially useful ( http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html) * Option to dummy code only the top n most frequently occurring values in any column * Option to exclude original column from results table * Option to create numeric column names (E.g. pivotcol_val1, pivotcol_val2 ...) instead of values in column names + secondary mapping table Thank you Jarrod Vawdrey Sr. Data Scientist Data Science & Engineering | Pivotal (650) 315-8905 https://pivotal.io/ On Fri, Oct 14, 2016 at 3:35 PM, Frank McQuillan <[email protected]> wrote: > For the module encoding categorical variables > http://madlib.incubator.apache.org/docs/latest/group__grp__data__prep.html > does anyone have any suggestions on improvements that we could make? > > Here is a video on how encoding categorical variables works for those not > familiar with it > https://www.youtube.com/watch?v=zxGgGMGJZRo&index=7&list=PL62pIycqXx- > Qf6EXu5FDxUgXW23BHOtcQ >
