> It mean that BigTable is used for analysis processing with arbitrary > set of elements by query, not a relational data processing.
Sorry, I think it could easily be understood wrongly. It's a end of expression of my thoughts. Let me see about it. :) On 2/7/08, edward yoon <[EMAIL PROTECTED]> wrote: > Actually, My most hadoop applications are made for numeric analysis. > Therefore, I was tried to make a generalized matrix in/out format. > https://issues.apache.org/jira/browse/HADOOP-2515 > as a Map<row, Map<column, cell>> structure after review the code and > discuss with gary bradski. > > But, If i make a new matrix file structure on Hadoop HDFS, i think it > could be some resemblancing going on Hbase. So, I think Hadoop + Hbase > is good fit with matrix management & operation. > > "It (BigTable) presents the abstraction of a 2-dimensional > table of data cells, with different versions over time making > up a third dimension." -- Failure Trends in a Large Disk Drive Population, > 2007 > > It mean that BigTable is used for analysis processing with arbitrary > set of elements by query, not a relational data processing. > > > I see http://wiki.apache.org/hadoop/Matrix > > Thanks for your review. > I hope we talk together soon. > > On 2/7/08, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > How do you think these various libraries fit into Hadoop? Does it > > make sense to just build what we need using HBase? I see > > http://wiki.apache.org/hadoop/Matrix > > does some matrix things, but then it has a Groovy overlay, so it > > isn't quite what we want, I don't think. > > > > Perhaps, we should just think about, and push up to Hadoop if we can, > > our own set of Hadoop based matrix libraries. Starting off, we need a > > decent way to create a matrix and populate it, then also basic matrix > > things like addition, multiplication, etc. Then we can add other > > things as we need them? For instance, I am interested in TextRank > > (search for Mihalcea and TextRank) and it essentially comes down to > > doing an iterative algorithm over a matrix. I was thinking I might, > > as a way to get deeper into the latest Hadoop, use it as a sample, > > useful algorithm. It's not specifically ML, but it does have > > interesting results and it is fairly easy to implement. > > > > Should we just lay out a page on the Wiki where we can start thinking > > about matrix needs? Using other libraries is definitely an option, > > but I am not sure if they will be optimal in the Hadoop environment. > > > > -Grant > > > > On Feb 6, 2008, at 12:18 PM, Ted Dunning wrote: > > > > > > > > There are unfortunately many choices for linear algebra in JVM's, none > > > particularly satisfactory. > > > > > > Colt is the one I use. It has a very odd syntax, but gives good > > > performance. The structure is such that it is very hard to extend > > > to, say, > > > sparse matrices. The licensing on Colt isn't particularly easy, > > > either and > > > I have been unable to contact the author to see about liberalizing it. > > > > > > Jama is now essentially defunct, but it had a very simple API and > > > not very > > > high performance. Extending to additional matrix types is also not > > > feasible > > > due to the design exposing matrix internal structure as a double > > > indexed > > > matrix. The licensing on Jama is very open. > > > > > > MTJ is high performance and has a less strange API than Colt, but I > > > haven't > > > used it so I can't say much about performance. I get the impression > > > it > > > would be difficult to extend, but I could well be wrong about that. > > > > > > Commons math uses an extension of Jama, I think. I haven't used > > > it. The > > > last time I looked seriously at commons math, the committers had > > > some very > > > odd agendas going on so I dropped it from consideration. It looks > > > like it > > > has come quite a ways since then, but I haven't dug into it deeply > > > since my > > > first evaluation. > > > > > > > > > On 2/6/08 12:45 AM, "Paul Elschot" <[EMAIL PROTECTED]> wrote: > > > > > >> Op Wednesday 06 February 2008 05:23:31 schreef Markus Weimer: > > >>> Hi, > > >>> One of my contributions to Elefant is an adapter to the Java > > >>> Version of UIMA > > >>> which allows you to pipe Python strings through a UIMA annotation > > >>> engine and > > >>> get feature vectors to work with back. This was done using JPype: < > > >>> http://jpype.sourceforge.net/>, a tool which links the JVM to the > > >>> CPython > > >>> VM. > > >>> > > >>> I choose this non-obvious approach because we use native code Python > > >>> extensions for the matrix operations, an area where Java > > >>> regrettably lacks > > >>> behind big time compared to native code. So, Jython was out of the > > >>> question > > >>> as I don't know any way to access a CPython extension from Jython. > > >>> I found > > >>> JPype to do the job and to do it well (the overhead per Cross-VM > > >>> call was > > >>> around 1ms on my laptop). So for those craving for a state-of-the- > > >>> art Python > > >>> with decent extensions and access to Java code, this might be an > > >>> option. > > >> > > >> Well, one of my favourite Java libraries made it into the email > > >> address of > > >> this > > >> list, and I must say, I was hoping to get some good solutions to > > >> the problem > > >> of > > >> linear algebra in a JVM here. Has this problem been discussed > > >> beforehand? > > >> > > >> I have only used linear algebra packages well before there was Java, > > >> so I wonder how to go about it now. > > >> > > >> Regards, > > >> Paul Elschot > > >> > > > > > > > -------------------------- > > Grant Ingersoll > > http://lucene.grantingersoll.com > > http://www.lucenebootcamp.com > > > > Lucene Helpful Hints: > > http://wiki.apache.org/lucene-java/BasicsOfPerformance > > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > > -- > B. Regards, > Edward yoon @ NHN, corp. > -- B. Regards, Edward yoon @ NHN, corp.
