[ https://issues.apache.org/jira/browse/MADLIB-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321521#comment-16321521 ]
Jingyi Mei commented on MADLIB-1160: ------------------------------------ [~fmcquillan]For LDA user doc: Currently, we only introduce madlib.lda_train, madlib.lda_predict and madlib.lda_get_perplexity on top. For some other functions users may need to call, such as madlib.lda_get_topic_desc, madlib.lda_get_word_topic_count, we directly use it in examples. I would suggest mention them somewhere on top so users can know all the tools they can use with lda and also have a clearer mind when reading examples. Also seems we don't have a helper function yet for lda and tf? > Usability changes for LDA > ------------------------- > > Key: MADLIB-1160 > URL: https://issues.apache.org/jira/browse/MADLIB-1160 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Utilities > Reporter: Frank McQuillan > Priority: Minor > Fix For: v1.14 > > > Context > Please see this thread from the user mailing list > http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201709.mbox/%3CCA%2B9JwyW78-aoe-NCQZc_iMuqW6SpKXs0H4JeTMfo3b-G4cxm0w%40mail.gmail.com%3E > Tasks > 1) Term frequency > http://madlib.apache.org/docs/latest/group__grp__text__utilities.html > and LDA > http://madlib.apache.org/docs/latest/group__grp__lda.html > should both creates indexes that start at 1, to make them consistent with > other MADlib modules. One or both of these currently create indexes starting > at 0. > 2) In the output_data_table *topic_assignment* is a dense vector but > *words* is a sparse vector (svec). > We should change *topic_assignment* to be a sparse vector to be consistent. > Note: the reason sparse vectors were used in the first place (I think) is to > keep the model state as small as possible, so it is preferred to dense format > in this case., although svecs are a bit harder to work with. We have hit the > Postgres 1GB field limit size in some use cases. > 3) The user docs could also use some cleanup at the same time. E.g., helper > functions are used in the examples but not described above. -- This message was sent by Atlassian JIRA (v6.4.14#64029)