Re: Hbase for question answer modeling

Jonathan Gray Tue, 18 Aug 2009 10:54:05 -0700

I'm having a little difficulty totally understanding your requirements,but let me take a stab.

You basically want a mapping from 1 to N QUESTIONS to a single ANSWER?When a new question comes in, you run an MR job that scans all existingquestions and does some kind of similarity metric against them to try tofind existing matches, and if one is found, add the new question to thelist of questions for that answer, and return the answer.

The first big question I have is, are you expecting thisquestion-matching query to be done in real-time? Or this is an offline,batch process? Remember, MapReduce is not for real-time queries. Atthe low end, for simple jobs, you will always run for several seconds ifnot tens of seconds (for VERY simple jobs).

But it seems like you would need to scan the entire table, and runsomething like a cosine similarity against every single question in it.That's going to be a much longer running job, depending on how manyquestions already exist, and certainly not real-time.

As for actually storing the questions, you should create two columnfamilies "questions" and "answer". For each question, you insert acolumn into the "questions" family. The "answer" family would alwayshave a single column (only a single answer right?). Then you can veryeasily query for all questions, and they will be grouped by row (I'm notsure what your row key will be).

You didn't talk much about how you plan on doing dupe-detection ofquestions, but there are some interesting ways to generate signatureswhich could turn into your row keys, then you could actually do somekind of online duplicate detecting of already answered questions.That's beyond the scope of this mailing list, however.


Hope that helps.  If you need more help, please provide more detail.

JG

Puri, Aseem wrote:

Hello

I am working on a model in which I have to manage question and their

answers.

I create two columns, one in which question is to be store and other its
answer.

Now people will ask question, so when a new question come I want to
execute map reduce job which find is same kind of question is already
exist or not.

If same question is asked then with map reduce I will find similar
question that exist and provide answer to him that is already there with
it. Also I want to append it with the similar question that is already
their in my table.

If question is different then I will store it in different row and its

answer will be given by some expert and be stored.

I know Hadoop HBase have property write once read many times. So I can't
append it.

I have two other options.

1.      Manage new similar question with help of timestamp.

2.      As a new similar question come I make new column qualifier and

store it in same row.

Please suggest that which approach should I follow and also that help in
my map reduce operation where I have to analyze similarity of new
question with every question that already exist. Also if some other
approach can help me please suggest me.

Regards

Aseem Puri

Re: Hbase for question answer modeling

Reply via email to