BTW, there's a JDBCDataModel in Taste, I think it will be convinient for
users if we provide a HBaseDataModel that leverage the hbase as the data
store.


Jeff Zhang



On Mon, Nov 16, 2009 at 4:54 PM, Jeff Zhang <[email protected]> wrote:

> Hi all,
>
> I start learning hbase these days. and I found we can use hbase for machine
> learning.
> In the field of machine learning, we always need to handle matrix and
> vector which is very fit to be stored in hbase.
>
> e.g. we always have to compute the doc-term matrix in text classification.
> If we use hbase, we can store each document as a row in hbase, and store
> the document id as the row id ,and tf (term frequency) as columns.
> e.g. we have one document A titled "love", and the content is:
> I love this game.
>
> Then we can store them as one hbase row:
> A: {tilte:love=>1,
> content:I=>1,content:love=>1,content:this=>1,content:game=>1}
>
>
> Using hbase, it will be very easy for us to compute the similarity between
> documents.
> And another  advantage of hbase compared to raw text data is that it's
> semi-structured. And I think it will be easy for programming if we use hbase
> rather than the raw data.
>
> This is currently what I think of, maybe there's something not correct,
> Hope to hear ideas from experts.
>
>
> Thank you.
>
> Jeff Zhang
>
>
>
>

Reply via email to