Actually, My most hadoop applications are made for numeric analysis. Therefore, I was tried to make a generalized matrix in/out format. https://issues.apache.org/jira/browse/HADOOP-2515 as a Map<row, Map<column, cell>> structure after review the code and discuss with gary bradski.
But, If i make a new matrix file structure on Hadoop HDFS, i think it could be some resemblancing going on Hbase. So, I think Hadoop + Hbase is good fit with matrix management & operation. "It (BigTable) presents the abstraction of a 2-dimensional table of data cells, with different versions over time making up a third dimension." -- Failure Trends in a Large Disk Drive Population, 2007 It mean that BigTable is used for analysis processing with arbitrary set of elements by query, not a relational data processing. > I see http://wiki.apache.org/hadoop/Matrix Thanks for your review. I hope we talk together soon. On 2/7/08, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > How do you think these various libraries fit into Hadoop? Does it > make sense to just build what we need using HBase? I see > http://wiki.apache.org/hadoop/Matrix > does some matrix things, but then it has a Groovy overlay, so it > isn't quite what we want, I don't think. > > Perhaps, we should just think about, and push up to Hadoop if we can, > our own set of Hadoop based matrix libraries. Starting off, we need a > decent way to create a matrix and populate it, then also basic matrix > things like addition, multiplication, etc. Then we can add other > things as we need them? For instance, I am interested in TextRank > (search for Mihalcea and TextRank) and it essentially comes down to > doing an iterative algorithm over a matrix. I was thinking I might, > as a way to get deeper into the latest Hadoop, use it as a sample, > useful algorithm. It's not specifically ML, but it does have > interesting results and it is fairly easy to implement. > > Should we just lay out a page on the Wiki where we can start thinking > about matrix needs? Using other libraries is definitely an option, > but I am not sure if they will be optimal in the Hadoop environment. > > -Grant > > On Feb 6, 2008, at 12:18 PM, Ted Dunning wrote: > > > > > There are unfortunately many choices for linear algebra in JVM's, none > > particularly satisfactory. > > > > Colt is the one I use. It has a very odd syntax, but gives good > > performance. The structure is such that it is very hard to extend > > to, say, > > sparse matrices. The licensing on Colt isn't particularly easy, > > either and > > I have been unable to contact the author to see about liberalizing it. > > > > Jama is now essentially defunct, but it had a very simple API and > > not very > > high performance. Extending to additional matrix types is also not > > feasible > > due to the design exposing matrix internal structure as a double > > indexed > > matrix. The licensing on Jama is very open. > > > > MTJ is high performance and has a less strange API than Colt, but I > > haven't > > used it so I can't say much about performance. I get the impression > > it > > would be difficult to extend, but I could well be wrong about that. > > > > Commons math uses an extension of Jama, I think. I haven't used > > it. The > > last time I looked seriously at commons math, the committers had > > some very > > odd agendas going on so I dropped it from consideration. It looks > > like it > > has come quite a ways since then, but I haven't dug into it deeply > > since my > > first evaluation. > > > > > > On 2/6/08 12:45 AM, "Paul Elschot" <[EMAIL PROTECTED]> wrote: > > > >> Op Wednesday 06 February 2008 05:23:31 schreef Markus Weimer: > >>> Hi, > >>> One of my contributions to Elefant is an adapter to the Java > >>> Version of UIMA > >>> which allows you to pipe Python strings through a UIMA annotation > >>> engine and > >>> get feature vectors to work with back. This was done using JPype: < > >>> http://jpype.sourceforge.net/>, a tool which links the JVM to the > >>> CPython > >>> VM. > >>> > >>> I choose this non-obvious approach because we use native code Python > >>> extensions for the matrix operations, an area where Java > >>> regrettably lacks > >>> behind big time compared to native code. So, Jython was out of the > >>> question > >>> as I don't know any way to access a CPython extension from Jython. > >>> I found > >>> JPype to do the job and to do it well (the overhead per Cross-VM > >>> call was > >>> around 1ms on my laptop). So for those craving for a state-of-the- > >>> art Python > >>> with decent extensions and access to Java code, this might be an > >>> option. > >> > >> Well, one of my favourite Java libraries made it into the email > >> address of > >> this > >> list, and I must say, I was hoping to get some good solutions to > >> the problem > >> of > >> linear algebra in a JVM here. Has this problem been discussed > >> beforehand? > >> > >> I have only used linear algebra packages well before there was Java, > >> so I wonder how to go about it now. > >> > >> Regards, > >> Paul Elschot > >> > > > > -------------------------- > Grant Ingersoll > http://lucene.grantingersoll.com > http://www.lucenebootcamp.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > -- B. Regards, Edward yoon @ NHN, corp.
