Re: [math] JSR 247: Data Mining 2.0

Mark Diggory Mon, 02 Jan 2006 19:20:45 -0800

Phil Steitz wrote:

On 1/2/06, Mark Diggory <[EMAIL PROTECTED]> wrote:

Phil,


This is a great idea as a specification and standard. We currently have
a service in our project which does something similar, but its mostly
implemented in Perl and R.


What project would that be?

My primary employment at the moment at Harvard; The Virtual Data Centerproject[http://www.thedata.org][http://www.sourceforge.net/projects/thedata]

I wonder though, how much of it would be implemented at that database
level vs. in the application. For instance, in doing a transform that
returned a subset of a dataset from a db, it would much more efficient
to do it at the db level (in the query) than in the application itself.


The spec being developed is focussed on the analytical / statistical
side rather than OLAP and also aims to be implementation-independent
(i.e., what is really being standardized is the API for vendors to
implement and client apps to use).  That said, your point is valid -
it may be difficult to optimize implementation of some functions when
the db engine can / should do much of the work natively.

But I like as well the idea of a standalone java based implementation
too (maybe on HSQLDB) or perhaps theres a direction that could be taken
with Hibernate as well.

As noted above, the functional areas being considered are more
analytical - regression, clustering, classification, feature
extraction, etc.  The overlap with [math] is in the statistical stuff.

Phil

Very true, we can explore implementations of the algorithms, I'm surethey would be useful the stat library. I point out HSQLDB because it hasthe capability to call java functions directly and use them in storedprocedures etc. See:


http://hsqldb.org/doc/guide/ch09.html#stored-section

I could see the placement of Commons Math libraries within thissituation be very effective if done right. Though in HSQLDB I'm stilllearning if the same can be done with updating aggregate functions theway one can with static methods.


-Mark

Re: [math] JSR 247: Data Mining 2.0

Reply via email to