Dru, It is not supposed to process many times the same rows. Can I see the log you're talking about? Also, how many regions do you have in your table? (info available in the web UI).
thx J-D On Wed, Jul 30, 2008 at 5:04 PM, Dru Jensen <[EMAIL PROTECTED]> wrote: > J-D, > > thanks for your quick response. I have 4 mapping processes running on 3 > systems. > > Are the same rows being processed 4 times by each mapping processor? > According to the logs they are. > > When I run a map/reduce against a file, only one row gets logged per > mapper. Why would this be different for hbase tables? > > I would think only one mapping process would process that one row and it > would only show up once in only one log. > preferable it would be the same system that has the region. > > I only want one row to be processed once. Is there anyway to change this > behavior without running only 1 mapper? > > thanks, > Dru > > > On Jul 30, 2008, at 1:44 PM, Jean-Daniel Cryans wrote: > > Dru, >> >> The regions will split when achieving a certain threshold so if you want >> your computing to be distributed, you will have to have more data. >> >> Regards, >> >> J-D >> >> On Wed, Jul 30, 2008 at 4:36 PM, Dru Jensen <[EMAIL PROTECTED]> wrote: >> >> Hello, >>> >>> I created a map/reduce process by extending the TableMap and TableReduce >>> API but for some reason >>> when I run multiple mappers, in the logs its showing that the same rows >>> are >>> being processed by each Mapper. >>> >>> When I say logs, I mean in the hadoop task tracker (localhost:50030) and >>> drilling down into the logs. >>> >>> Do I need to manually perform a TableSplit or is this supposed to be done >>> automatically? >>> >>> If its something I need to do manually, can someone point me to some >>> sample >>> code? >>> >>> If its supposed to be automatic and each mapper was supposed to get its >>> own >>> set of rows, >>> should I write up a bug for this? I using trunk 0.2.0 on hadoop trunk >>> 0.17.2. >>> >>> thanks, >>> Dru >>> >>> >
