Re: questions on usage of output collector

Nishant Khurana Tue, 25 Nov 2008 06:03:26 -0800

Hi,
I think that works for me. What I meant below was that the Output Collector
of TableReduce only collects ImmutableBytesWritable and BatchUpdate type of
key and value. So I was asking how can I use other datatypes while writing
back to Hbase tables using TableReduce. But seems either my mapper or
reducer has to convert my datatypes into above mentioned using the method
you suggested and them pass it on to output collector of table reduce.
Let me know if I am missing something.


Also, how can I store multiple values for the same column in Hbase. Like a
movie id containing 5 genres all coming under column genre. My mapper was
extracting a comma separated list of genres from a text file for a movie id
and separating it to produce id, genre pair. Then I passed them to reducer
to add it to table but BatchUpdate seem to overwrite the previous entries by
the last one. Can I store all values in the same column ?

Thanks

On Mon, Nov 24, 2008 at 1:50 PM, stack <[EMAIL PROTECTED]> wrote:

> Nishant Khurana wrote:
>
>> Hi,
>> I was writing a mapreduce class to read from a text file and write the
>> entries to a table. My Map function reads each line and outputs a key and
>> a
>> MapWritable as value. I was wondering while writing reduce using
>> TableReduce, how to convert the key (IntWritable) to
>> ImmutableBytesWritable
>> and Mapwritable object to BatchUpdate so that my outputcollector doesn't
>> complain in reduce function. It seems to enforce the signature where it
>> collects the above two datatypes only.
>>
>>
>
>
> For the key, would something like the below work for you:
>
> // Let 'key' be the IntWritable passed to the reduce. key.get() returns an
> int.
> // Bytes has a bunch of overrides for different types returning byte [].
> ImmutableBytesWritable ibw = new
> ImmutableBytesWritable(Bytes.toBytes(key.get()));
>
> For the MapWritable to BatchUpdate, how about:
>
>       // Again, let 'key' but the passed IntValue key.  To make a byte
> array of it,
>       // use, Bytes.toBytes.
>       BatchUpdate bu = new BatchUpdate(Bytes.toBytes(key.get()));
>       // Let 'v' be the MapWritable passed to this reduce.
>       while (v.hasNext()) {
>         HbaseMapWritable<SomeWritable, SomeWritable> hmw = v.next();
>         for (Entry<SomeWritable, SomeWritable> e: hmw.entrySet()) {
>           bu.put(Bytes.toBytes(e.get()), Bytes.toBytes(e.get()));
>         }
>       }
>
> For 0.19.0 hbase, there is an example that does similar to what you are up
> to under src/examples/mapred though I think it might depend on a recent fix
> to HbaseMapWritable that allowed it take byte array as value, not just
> Writables.
>
>  Also I believe I can only use above two datatypes while using table reduce
>> but couldn't understand them very well. How can I convert any datatype to
>> the above two to write them to the tables.
>>
>>
>>
> Please say more.  I don't think I follow exactly (And would like to fix
> this for 0.19.0 if its what I think you are saying).
>
> St.Ack
>



-- 
Nishant Khurana
Candidate for Masters in Engineering (Dec 2009)
Computer and Information Science
School of Engineering and Applied Science
University of Pennsylvania

Re: questions on usage of output collector

Reply via email to