[ 
https://issues.apache.org/jira/browse/MADLIB-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363275#comment-16363275
 ] 

ASF GitHub Bot commented on MADLIB-1202:
----------------------------------------

GitHub user jingyimei opened a pull request:

    https://github.com/apache/madlib/pull/234

    Create lower case column name in encode_categorical_variables()

    JIRA:MADLIB-1202
    The previous madlib.encode_categorical_variables() function generates
    column name with some capital characters, including:
    1. when you specify top_values, there will be a column name with suffix 
__MISC__
    2. when you set encode_nulls as True, there will be a column name with 
suffix
    __NULL
    3. when the original column is boolean type, there will be column names
    with suffix _True and _False
    
    In the above cases, users have to use double quoting to query, which is
    not conveninet.
    
    This commit adresses this, and all of the three scenarios will generate
    coloumn name with lower cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jingyimei/madlib 
encode_categorial_column_name_change

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/234.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #234
    
----
commit 4d78f9425ffb76089d30bbe85cb3e07f1050268a
Author: Jingyi Mei <jmei@...>
Date:   2018-02-14T00:11:39Z

    Create lower case column name in encode_categorical_variables()
    
    JIRA:MADLIB-1202
    The previous madlib.encode_categorical_variables() function generates
    column name with some capital characters, including:
    1. when you specify top_values, there will be a column name with suffix 
__MISC__
    2. when you set encode_nulls as True, there will be a column name with 
suffix
    __NULL
    3. when the original column is boolean type, there will be column names
    with suffix _True and _False
    
    In the above cases, users have to use double quoting to query, which is
    not conveninet.
    
    This commit adresses this, and all of the three scenarios will generate
    coloumn name with lower cases.

----


> encode_categorical_variables() creates all lower case column names for 
> boolean columns
> --------------------------------------------------------------------------------------
>
>                 Key: MADLIB-1202
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1202
>             Project: Apache MADlib
>          Issue Type: Improvement
>            Reporter: Jarrod Vawdrey
>            Assignee: Jingyi Mei
>            Priority: Minor
>             Fix For: v1.14
>
>
>  
> It would be handy if encode_categorical_variables() created lower case column 
> names for boolean columns vs upper case that require double quoting to query. 
> Current implementation generates "<boolean column name>_True" and "<boolean 
> column name>_False".
> Improvement to generate <boolean column name>_true and <boolean column 
> name>_false.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to