On Mar 19, 2008, at 9:56 PM, Karl Wettin wrote:
Grant Ingersoll skrev:
Now that we have some code in place for clustering, I think it
would be cool to put together some examples/demos of real world
problems. Things like clustering text (perhaps we can use the
wikipedia download or the reuters download that Lucene contrib/
benchmark uses) or clustering other pieces of data.
We could setup a demo area of code and use Lucene's analysis code
to create document vectors.
Ideas and/or thoughts or volunteers?
Should a demo make sense enough so people who never heard about
machine learning before understand what's going on? Or should it
mainly show how to use the API? Or is it something that is just
built to show off working or large data set?
I think it is more about working with the APIs, at least for now. In
the longer run, intro to ML would be cool, but there is lots available
on that. I don't think it should be that large, as I don't think we
can really show scale. Just something that shows how to get the
source, set it up to run against a test set of data and somehow see
the results, even if it is trivial cmd. line stuff.