Dru,

It is not supposed to process many times the same rows. Can I see the log
you're talking about? Also, how many regions do you have in your table?
(info available in the web UI).

thx

J-D

On Wed, Jul 30, 2008 at 5:04 PM, Dru Jensen <[EMAIL PROTECTED]> wrote:

> J-D,
>
> thanks for your quick response.   I have 4 mapping processes running on 3
> systems.
>
> Are the same rows being processed 4 times by each mapping processor?
>  According to the logs they are.
>
> When I run a map/reduce against a file, only one row gets logged per
> mapper.  Why would this be different for hbase tables?
>
> I would think only one mapping process would process that one row and it
> would only show up once in only one log.
> preferable it would be the same system that has the region.
>
> I only want one row to be processed once.  Is there anyway to change this
> behavior without running only 1 mapper?
>
> thanks,
> Dru
>
>
> On Jul 30, 2008, at 1:44 PM, Jean-Daniel Cryans wrote:
>
>  Dru,
>>
>> The regions will split when achieving a certain threshold so if you want
>> your computing to be distributed, you will have to have more data.
>>
>> Regards,
>>
>> J-D
>>
>> On Wed, Jul 30, 2008 at 4:36 PM, Dru Jensen <[EMAIL PROTECTED]> wrote:
>>
>>  Hello,
>>>
>>> I created a map/reduce process by extending the TableMap and TableReduce
>>> API but for some reason
>>> when I run multiple mappers, in the logs its showing that the same rows
>>> are
>>> being processed by each Mapper.
>>>
>>> When I say logs, I mean in the hadoop task tracker (localhost:50030) and
>>> drilling down into the logs.
>>>
>>> Do I need to manually perform a TableSplit or is this supposed to be done
>>> automatically?
>>>
>>> If its something I need to do manually, can someone point me to some
>>> sample
>>> code?
>>>
>>> If its supposed to be automatic and each mapper was supposed to get its
>>> own
>>> set of rows,
>>> should I write up a bug for this?  I using trunk 0.2.0 on hadoop trunk
>>> 0.17.2.
>>>
>>> thanks,
>>> Dru
>>>
>>>
>

Reply via email to