[
https://issues.apache.org/jira/browse/MADLIB-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089831#comment-15089831
]
Frank McQuillan edited comment on MADLIB-933 at 1/8/16 7:53 PM:
----------------------------------------------------------------
At the same time, could you please fix the docs at
http://doc.madlib.net/latest/group__grp__text__utilities.html
Under the examples, #2 reads:
{code:sql}
ALTER TABLE documents DROP COLUMN words;
ALTER TABLE documents ADD COLUMN words TEXT[];
UPDATE documents SET words = regexp_split_to_array(lower(doc_contents),
E'[\s+\.]');
{code}
but it should read:
{code:sql}
ALTER TABLE documents ADD COLUMN words TEXT[];
UPDATE documents SET words = regexp_split_to_array(lower(doc_contents),
E'[\\s+\\.]');
{code}
was (Author: fmcquillan):
At the same time, could you please fix the docs at
http://doc.madlib.net/latest/group__grp__text__utilities.html
Under the examples, #2 reads:
```
ALTER TABLE documents DROP COLUMN words;
ALTER TABLE documents ADD COLUMN words TEXT[];
UPDATE documents SET words = regexp_split_to_array(lower(doc_contents),
E'[\s+\.]');
```
but it should read:
```
ALTER TABLE documents ADD COLUMN words TEXT[];
UPDATE documents SET words = regexp_split_to_array(lower(doc_contents),
E'[\\s+\\.]');
```
> MADlib LDA term_frequency function bugs
> ---------------------------------------
>
> Key: MADLIB-933
> URL: https://issues.apache.org/jira/browse/MADLIB-933
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Parallel Latent Dirichlet Allocation
> Reporter: Srivatsan
> Assignee: Rahul Iyer
> Fix For: v1.9
>
>
> 1. madlib.term_frequency() function
> (http://doc.madlib.net/latest/group__grp__text__utilities.html) takes the
> docid column and words columns as inputs, but this just fools us into
> thinking that we could name our columns as whatever we want, coz it complains
> if the columns are not actually named "docid" and "words"!
> 2. Secondly, it takes an output table as well as input (ex: documents_tf),
> but it creates a temp table for the vocabulary (therefore i can't specify a
> schema name like vatsan.documents_tf). This is annoying for two reasons
> a. The user can't immediately senses what's with the vocabulary table and why
> is it a temp table while the documents_tf table itself is not.
> b. If i have a real world dataset for LDA, my models are going to run for
> quite sometime. I may even terminate one session and run the LDA model in
> another session, this would mean the vocabulary temp table won't be available
> in the other session (or would have gotten dropped)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)