Phil Steitz wrote:
On 1/2/06, Mark Diggory <[EMAIL PROTECTED]> wrote:
Phil,
This is a great idea as a specification and standard. We currently have
a service in our project which does something similar, but its mostly
implemented in Perl and R.
What project would that be?
My primary employment at the moment at Harvard; The Virtual Data Center
project
[http://www.thedata.org][http://www.sourceforge.net/projects/thedata]
I wonder though, how much of it would be implemented at that database
level vs. in the application. For instance, in doing a transform that
returned a subset of a dataset from a db, it would much more efficient
to do it at the db level (in the query) than in the application itself.
The spec being developed is focussed on the analytical / statistical
side rather than OLAP and also aims to be implementation-independent
(i.e., what is really being standardized is the API for vendors to
implement and client apps to use). That said, your point is valid -
it may be difficult to optimize implementation of some functions when
the db engine can / should do much of the work natively.
But I like as well the idea of a standalone java based implementation
too (maybe on HSQLDB) or perhaps theres a direction that could be taken
with Hibernate as well.
As noted above, the functional areas being considered are more
analytical - regression, clustering, classification, feature
extraction, etc. The overlap with [math] is in the statistical stuff.
Phil
Very true, we can explore implementations of the algorithms, I'm sure
they would be useful the stat library. I point out HSQLDB because it has
the capability to call java functions directly and use them in stored
procedures etc. See:
http://hsqldb.org/doc/guide/ch09.html#stored-section
I could see the placement of Commons Math libraries within this
situation be very effective if done right. Though in HSQLDB I'm still
learning if the same can be done with updating aggregate functions the
way one can with static methods.
-Mark