[ https://issues.apache.org/jira/browse/MADLIB-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357677#comment-16357677 ]
ASF GitHub Bot commented on MADLIB-1160: ---------------------------------------- Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/232 Functional test of these 4 commits seem fine to me. I added comments and examples in: MADLIB-1160 MADLIB-1201 Will create a PR for associated user doc changes shortly. > Usability changes for LDA > ------------------------- > > Key: MADLIB-1160 > URL: https://issues.apache.org/jira/browse/MADLIB-1160 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Utilities > Reporter: Frank McQuillan > Assignee: Jingyi Mei > Priority: Minor > Fix For: v1.14 > > > Context > Please see this thread from the user mailing list > > [http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201709.mbox/%3CCA%2B9JwyW78-aoe-NCQZc_iMuqW6SpKXs0H4JeTMfo3b-G4cxm0w%40mail.gmail.com%3E] > Tasks > 1) Term frequency > [http://madlib.apache.org/docs/latest/group__grp__text__utilities.html] > and LDA > [http://madlib.apache.org/docs/latest/group__grp__lda.html] > should both creates indexes that start at 1, to make them consistent with > other MADlib modules. One or both of these currently create indexes starting > at 0. > 2) In the output_data_table *topic_assignment* is a dense vector but *words* > is a sparse vector (svec). > We should change *topic_assignment* to be a sparse vector to be consistent. > Note: the reason sparse vectors were used in the first place (I think) is to > keep the model state as small as possible, so it is preferred to dense format > in this case., although svecs are a bit harder to work with. We have hit the > Postgres 1GB field limit size in some use cases. > 3) The user docs could also use some cleanup at the same time. E.g., helper > functions are used in the examples but not described above. > 4) The helper function `madlib.lda_get_topic_desc` should return top k words > (and ties). It seems to returning the top k-1 words (and ties) now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)