On Mar 19, 2008, at 9:56 PM, Karl Wettin wrote:

Grant Ingersoll skrev:
Now that we have some code in place for clustering, I think it would be cool to put together some examples/demos of real world problems. Things like clustering text (perhaps we can use the wikipedia download or the reuters download that Lucene contrib/ benchmark uses) or clustering other pieces of data. We could setup a demo area of code and use Lucene's analysis code to create document vectors.
Ideas and/or thoughts or volunteers?

Should a demo make sense enough so people who never heard about machine learning before understand what's going on? Or should it mainly show how to use the API? Or is it something that is just built to show off working or large data set?


I think it is more about working with the APIs, at least for now. In the longer run, intro to ML would be cool, but there is lots available on that. I don't think it should be that large, as I don't think we can really show scale. Just something that shows how to get the source, set it up to run against a test set of data and somehow see the results, even if it is trivial cmd. line stuff.

Reply via email to