[
https://issues.apache.org/jira/browse/MAHOUT-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049954#comment-13049954
]
Hector Yee edited comment on MAHOUT-732 at 6/15/11 6:51 PM:
------------------------------------------------------------
Example command lines:
Generate a ranking autoencoder trained with reuters, save every 10k iterations,
10 iterations, 20 topics. 34k words, learning rate of 0.1, regularization of
0.001
bin/mahout -core org.apache.mahout.clustering.autoencoder.Autoencoder -i
reuters-vectors/tfidf-vectors/ -o reuters-autoencoder -si 10000 -it 10 -t 20 -w
34262 -lr 0.1 -r 0.001
bin/mahout -core
org.apache.mahout.clustering.autoencoder.AutoencoderPrintTopics -i
reuters-autoencoder/state-23 -d reuters-vectors/dictionary.file-0 -dt
sequencefile -o foo23
cat foo23/topic_19
plus [p(plus|topic_19) = 1.64054447372968
cents [p(cents|topic_19) = 1.59648803935064
close [p(close|topic_19) = 1.585970140983822
day [p(day|topic_19) = 1.436174292297341
continued [p(continued|topic_19) = 1.2311056717687845
production [p(production|topic_19) = 1.0971327889182472
debt [p(debt|topic_19) = 0.824801495480841
profit [p(profit|topic_19) = 0.7740839025630852
program [p(program|topic_19) = 0.6448215826923319
products [p(products|topic_19) = 0.59396394073737
deal [p(deal|topic_19) = 0.5417773636589548
combination [p(combination|topic_19) = 0.525162688536503
change [p(change|topic_19) = 0.48809557650241214
profits [p(profits|topic_19) = 0.4644487242090055
profitable [p(profitable|topic_19) = 0.42288746966211166
coast [p(coast|topic_19) = 0.409891931409948
certain [p(certain|topic_19) = 0.3292666140569414
deadline [p(deadline|topic_19) = 0.3013666321744536
programme [p(programme|topic_19) = 0.23413646427510545
coastal [p(coastal|topic_19) = 0.2194741737164908
was (Author: hector.yee):
Example command lines:
Generate a ranking autoencoder trained with reuters, save
bin/mahout -core org.apache.mahout.clustering.autoencoder.Autoencoder -i
reuters-vectors/tfidf-vectors/ -o reuters-autoencoder -si 10000 -it 10 -t 20 -w
34262 -lr 0.1 -r 0.001
bin/mahout -core
org.apache.mahout.clustering.autoencoder.AutoencoderPrintTopics -i
reuters-autoencoder/state-23 -d reuters-vectors/dictionary.file-0 -dt
sequencefile -o foo23
cat foo23/topic_19
plus [p(plus|topic_19) = 1.64054447372968
cents [p(cents|topic_19) = 1.59648803935064
close [p(close|topic_19) = 1.585970140983822
day [p(day|topic_19) = 1.436174292297341
continued [p(continued|topic_19) = 1.2311056717687845
production [p(production|topic_19) = 1.0971327889182472
debt [p(debt|topic_19) = 0.824801495480841
profit [p(profit|topic_19) = 0.7740839025630852
program [p(program|topic_19) = 0.6448215826923319
products [p(products|topic_19) = 0.59396394073737
deal [p(deal|topic_19) = 0.5417773636589548
combination [p(combination|topic_19) = 0.525162688536503
change [p(change|topic_19) = 0.48809557650241214
profits [p(profits|topic_19) = 0.4644487242090055
profitable [p(profitable|topic_19) = 0.42288746966211166
coast [p(coast|topic_19) = 0.409891931409948
certain [p(certain|topic_19) = 0.3292666140569414
deadline [p(deadline|topic_19) = 0.3013666321744536
programme [p(programme|topic_19) = 0.23413646427510545
coastal [p(coastal|topic_19) = 0.2194741737164908
> Implement ranking autoencoder on top of gradient machine
> --------------------------------------------------------
>
> Key: MAHOUT-732
> URL: https://issues.apache.org/jira/browse/MAHOUT-732
> Project: Mahout
> Issue Type: New Feature
> Components: Clustering
> Affects Versions: 0.6
> Reporter: Hector Yee
> Priority: Minor
> Fix For: 0.6
>
> Attachments: MAHOUT-732.gitpatch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Implement a ranking autoencoder clusterer based on top of gradient machine.
> See
> https://docs.google.com/present/edit?id=0AQC247eq7Jp5ZGZ6NXpyOWhfMjlmM2pzdjRkZw&authkey=CNj2h98P&hl=en_US
> for details
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira