Re: Categorization stuff

Grant Ingersoll Wed, 15 Jul 2009 15:11:05 -0700

Also, seems like we could make the Wikipedia example just a bit moregeneric by not restricting to just countries, right? The code appearsto just looks to see if a Category contains some entry in a list (inthe current case, countries) and then labels the doc that way, butthere really isn't anything special about that. For instance, I couldhave categories by subject, i.e. History, Math, etc. right?


-Grant

On Jul 15, 2009, at 5:01 PM, Robin Anil wrote:

Hi Grant,
For Bayes input is a tab separated flat files. with each document isin a line. Label as the first word followed by a tab and followed bythe flattened document.I will be travelling the next 3 days, as I am relocating to my Joblocation. So I hope i will be able to give you the documentation ofthe same by Monday morning.
Robin
On Thu, Jul 16, 2009 at 1:02 AM, Grant Ingersoll<[email protected]> wrote:
Hi Robin,
I have been looking a bit at the classification stuff a bit more andam wondering if we should be switching to use Vectors now, since thename could be the label and the value can contain weights, similarto what we do for clustering.
Also, I was wondering if you could document the format used for theinput files now and the steps taken by the algorithms. I'm tryingto better understand the Wikipedia examples and also the HBase.
-Grant


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Categorization stuff

Reply via email to