Github user takuti commented on the issue:
https://github.com/apache/incubator-hivemall/pull/66
#### Requirements
```sql
drop temporary function if exists tokenize;
create temporary function tokenize as 'hivemall.tools.text.TokenizeUDF';
drop temporary function if exists is_stopword;
create temporary function is_stopword as 'hivemall.tools.text.StopwordUDF';
drop temporary function if exists feature;
create temporary function feature as 'hivemall.ftvec.FeatureUDF';
drop temporary function if exists lda;
create temporary function lda as 'hivemall.lda.OnlineLDAUDTF';
```
#### Sample query
```sql
with features as (
select
docid,
feature(word, count(word)) as f
from (
select 1 as docid, "Fruits and vegetables are healthy." as doc
union all
select 2 as docid, "I like apples, oranges, and avocados. I do not like
the flu or colds." as doc
) t1 LATERAL VIEW explode(tokenize(doc, true)) t2 as word
where
not is_stopword(word)
group by
docid, word
),
t as (
select docid, collect_set(f) as words
from features
group by docid
)
select lda(words, "-topic 2 -iter 20") from t
;
```
#### Result
|topic | word | score|
|:---:|:---:|:---:|
|0 | fruits | 0.33372128|
|0 | vegetables | 0.33272517|
|0 | healthy | 0.33246377|
|0 | flu | 2.3617347E-4|
|0 | apples | 2.1898883E-4|
|0 | oranges | 1.8161473E-4|
|0 | like | 1.7666373E-4|
|0 | avocados | 1.726186E-4|
|0 | colds | 1.037139E-4|
|1 | colds | 0.16622013|
|1 | avocados | 0.16618845|
|1 | oranges | 0.1661859|
|1 | like | 0.16618414|
|1 | apples | 0.16616651|
|1 | flu | 0.16615893|
|1 | healthy | 0.0012059759|
|1 | vegetables | 0.0010818697|
|1 | fruits | 6.080827E-4|
Clearly, topic0 corresponds to doc1, and topic1 represents doc2 topic words.
@myui Could you review whether this interface is sufficient?
From now, I will carefully check if the algorithm is implemented correctly
in `OnlineLDAModel`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---