[ 
https://issues.apache.org/jira/browse/MADLIB-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140107#comment-15140107
 ] 

ASF GitHub Bot commented on MADLIB-933:
---------------------------------------

GitHub user iyerr3 opened a pull request:

    https://github.com/apache/incubator-madlib/pull/15

    Term Freq: Allow custom col names, avoid temp vocab

    JIRA: MADLIB-933
    
    - Fixed a minor bug that forced users to use "doc_id" as a column name.
    - Fixed an incorrect temp table output for the vocabulary.
    
    @mktal: Please review the PR and push to the apache remote after approval. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/iyerr3/incubator-madlib bugfix/term_freq_fixes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/15.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15
    
----
commit 8c770f0535692cb3685b876e8f4e06d8c5844548
Author: Rahul Iyer <[email protected]>
Date:   2015-12-07T22:01:04Z

    Term Freq: Allow custom col names, avoid temp vocab
    
    JIRA: MADLIB-933
    
    - Fixed a minor bug that forced users to use "doc_id" as a column name.
    - Fixed an incorrect temp table output for the vocabulary.

----


> MADlib LDA term_frequency function bugs
> ---------------------------------------
>
>                 Key: MADLIB-933
>                 URL: https://issues.apache.org/jira/browse/MADLIB-933
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Parallel Latent Dirichlet Allocation
>            Reporter: Srivatsan
>            Assignee: Rahul Iyer
>             Fix For: v1.9
>
>
> 1. madlib.term_frequency() function 
> (http://doc.madlib.net/latest/group__grp__text__utilities.html) takes the 
> docid column and words columns as inputs, but this just fools us into 
> thinking that we could name our columns as whatever we want, coz it complains 
> if the columns are not actually named "docid" and "words"!
> 2. Secondly, it takes an output table as well as input (ex: documents_tf), 
> but it creates a temp table for the vocabulary (therefore i can't specify a 
> schema name like vatsan.documents_tf). This is annoying for two reasons
> a. The user can't immediately senses what's with the vocabulary table and why 
> is it a temp table while the documents_tf table itself is not.
> b. If i have a real world dataset for LDA, my models are going to run for 
> quite sometime. I may even terminate one session and run the LDA model in 
> another session, this would mean the vocabulary temp table won't be available 
> in the other session (or would have gotten dropped)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to