BTW, there's a JDBCDataModel in Taste, I think it will be convinient for users if we provide a HBaseDataModel that leverage the hbase as the data store.
Jeff Zhang On Mon, Nov 16, 2009 at 4:54 PM, Jeff Zhang <[email protected]> wrote: > Hi all, > > I start learning hbase these days. and I found we can use hbase for machine > learning. > In the field of machine learning, we always need to handle matrix and > vector which is very fit to be stored in hbase. > > e.g. we always have to compute the doc-term matrix in text classification. > If we use hbase, we can store each document as a row in hbase, and store > the document id as the row id ,and tf (term frequency) as columns. > e.g. we have one document A titled "love", and the content is: > I love this game. > > Then we can store them as one hbase row: > A: {tilte:love=>1, > content:I=>1,content:love=>1,content:this=>1,content:game=>1} > > > Using hbase, it will be very easy for us to compute the similarity between > documents. > And another advantage of hbase compared to raw text data is that it's > semi-structured. And I think it will be easy for programming if we use hbase > rather than the raw data. > > This is currently what I think of, maybe there's something not correct, > Hope to hear ideas from experts. > > > Thank you. > > Jeff Zhang > > > >
