[ 
https://issues.apache.org/jira/browse/MADLIB-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365814#comment-16365814
 ] 

ASF GitHub Bot commented on MADLIB-1202:
----------------------------------------

Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/234#discussion_r168525141
  
    --- Diff: src/ports/postgres/modules/utilities/encode_categorical.py_in ---
    @@ -317,7 +317,19 @@ class CategoricalEncoder(object):
     
                 if self.output_type not in ('array', 'svec'):
                     if not self._output_dictionary:
    -                    value_names = {None: 'NULL',
    +                    ## MADLIB-1202
    +                    ## In postgres, boolean variables are always saved
    +                    ## as 'True', 'False' with the first letter as capital,
    +                    ## which will cause the generated column name as
    +                    ## <boolean column name>_True/False that needs double
    +                    ## quoting to query. To make it more convnient to user,
    +                    ## we cast them to lower case true/false so that the
    +                    ## generated column name is <boolean column 
name>_true/false
    +                    ## The same logic applied to _null and _misc strs
    +                    if v in ('True', 'False'):
    --- End diff --
    
    Wouldn't this be better off as `if isinstance(v, bool)`? 


> encode_categorical_variables() creates all lower case column names for 
> boolean columns
> --------------------------------------------------------------------------------------
>
>                 Key: MADLIB-1202
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1202
>             Project: Apache MADlib
>          Issue Type: Improvement
>            Reporter: Jarrod Vawdrey
>            Assignee: Jingyi Mei
>            Priority: Minor
>             Fix For: v1.14
>
>
>  
> It would be handy if encode_categorical_variables() created lower case column 
> names for boolean columns vs upper case that require double quoting to query. 
> Current implementation generates "<boolean column name>_True" and "<boolean 
> column name>_False".
> Improvement to generate <boolean column name>_true and <boolean column 
> name>_false.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to