[
https://issues.apache.org/jira/browse/MAHOUT-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Palumbo updated MAHOUT-1502:
-----------------------------------
Attachment: MAHOUT-1502_draft.patch
Here's a patch for a draft of the reworked Naive Bayes page. I was hoping to
get some feedback on weather or not style and content-wise it's what you're
looking for.
I've basically taken table 4 from Rennie and made a few minor changes to the
steps 1,2 ("preprocessing") to reflect the TF-IDF transformations actually
made by the lucene DefaultSimilarity class called from seq2sparse, and then
gave a brief overview of the corresponding NB CLI commands. I've made no
mention of any java code except for its location.
I probably need to rewrite the implementation section completely.
A few questions i had:
1. Do we want to stick with "Bayes" and "CBayes"? I've written it this way,
but i think that they could be a little bit confusing.
2. Should i provide a more thorough end to end explanation of building a model
from the command line? I am thinking no since the 20 Newsgroups page has that.
(I think that page also needs some work. I'm not sure if there is a jira open
for that).
3. Should there be a Java section on building a NB model?
Also I'm not sure if for what i've called "preprocessing": steps 1-3 belong on
this page. I've left them in as the Rennie paper references them pretty
heavily. But they could be confusing things as they are more an issue for
seq2sparse (which i'm increasingly thinking deserves its own page).
Let me know of any changes that need to be made.
Thanks.
> Update Naive Bayes Webpage to Current Implementation
> -----------------------------------------------------
>
> Key: MAHOUT-1502
> URL: https://issues.apache.org/jira/browse/MAHOUT-1502
> Project: Mahout
> Issue Type: Bug
> Components: Documentation
> Affects Versions: 0.9
> Reporter: Andrew Palumbo
> Priority: Minor
> Fix For: 1.0
>
> Attachments: MAHOUT-1502_draft.patch
>
>
> Current Naive Bayes page is for pre .7 NB implementation:
> https://mahout.apache.org/users/classification/bayesian.html
> post .7, TF-IDF calculations are preformed outside of NB.
--
This message was sent by Atlassian JIRA
(v6.2#6252)