How do you think these various libraries fit into Hadoop? Does it make sense to just build what we need using HBase? I see http://wiki.apache.org/hadoop/Matrix does some matrix things, but then it has a Groovy overlay, so it isn't quite what we want, I don't think.

Perhaps, we should just think about, and push up to Hadoop if we can, our own set of Hadoop based matrix libraries. Starting off, we need a decent way to create a matrix and populate it, then also basic matrix things like addition, multiplication, etc. Then we can add other things as we need them? For instance, I am interested in TextRank (search for Mihalcea and TextRank) and it essentially comes down to doing an iterative algorithm over a matrix. I was thinking I might, as a way to get deeper into the latest Hadoop, use it as a sample, useful algorithm. It's not specifically ML, but it does have interesting results and it is fairly easy to implement.

Should we just lay out a page on the Wiki where we can start thinking about matrix needs? Using other libraries is definitely an option, but I am not sure if they will be optimal in the Hadoop environment.

-Grant

On Feb 6, 2008, at 12:18 PM, Ted Dunning wrote:


There are unfortunately many choices for linear algebra in JVM's, none
particularly satisfactory.

Colt is the one I use.  It has a very odd syntax, but gives good
performance. The structure is such that it is very hard to extend to, say, sparse matrices. The licensing on Colt isn't particularly easy, either and
I have been unable to contact the author to see about liberalizing it.

Jama is now essentially defunct, but it had a very simple API and not very high performance. Extending to additional matrix types is also not feasible due to the design exposing matrix internal structure as a double indexed
matrix.  The licensing on Jama is very open.

MTJ is high performance and has a less strange API than Colt, but I haven't used it so I can't say much about performance. I get the impression it
would be difficult to extend, but I could well be wrong about that.

Commons math uses an extension of Jama, I think. I haven't used it. The last time I looked seriously at commons math, the committers had some very odd agendas going on so I dropped it from consideration. It looks like it has come quite a ways since then, but I haven't dug into it deeply since my
first evaluation.


On 2/6/08 12:45 AM, "Paul Elschot" <[EMAIL PROTECTED]> wrote:

Op Wednesday 06 February 2008 05:23:31 schreef Markus Weimer:
Hi,
One of my contributions to Elefant is an adapter to the Java Version of UIMA which allows you to pipe Python strings through a UIMA annotation engine and
get feature vectors to work with back. This was done using JPype: <
http://jpype.sourceforge.net/>, a tool which links the JVM to the CPython
VM.

I choose this non-obvious approach because we use native code Python
extensions for the matrix operations, an area where Java regrettably lacks behind big time compared to native code. So, Jython was out of the question as I don't know any way to access a CPython extension from Jython. I found JPype to do the job and to do it well (the overhead per Cross-VM call was around 1ms on my laptop). So for those craving for a state-of-the- art Python with decent extensions and access to Java code, this might be an option.

Well, one of my favourite Java libraries made it into the email address of
this
list, and I must say, I was hoping to get some good solutions to the problem
of
linear algebra in a JVM here. Has this problem been discussed beforehand?

I have only used linear algebra packages well before there was Java,
so I wonder how to go about it now.

Regards,
Paul Elschot



--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




Reply via email to