No. Should I? -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel Cryans Sent: Wednesday, October 21, 2009 10:55 AM To: [email protected] Subject: Re: Table Upload Optimization
Are you using the Hadoop Streaming API? J-D On Wed, Oct 21, 2009 at 7:52 AM, Mark Vigeant <[email protected]> wrote: > Hey > > So I want to upload a lot of XML data into an HTable. I have a class that > successfully maps up to about 500 MB of data or so (on one regionserver) into > a table, but if I go for much bigger than that it takes forever and > eventually just stops. I tried uploading a big XML file into my 4 > regionserver cluster (about 7 GB) and it's been a day and it's still going at > it. > > What I get when I run the job on the 4 node cluster is: > 10/21/09 10:22:35 INFO mapred.LocalJobRunner: > 10/21/09 10:22:38 INFO mapred.LocalJobRunner: > (then it does that for a while until...) > 10/21/09 10:22:52 INFO mapred.TaskRunner: Task attempt_local_0001_m_000117_0 > is done. And is in the process of committing > 10/21/09 10:22:52 INFO mapred.LocalJobRunner: > 10/21/09 10:22:52 mapred.TaskRunner: Task 'attempt_local_0001_m_000117_0' is > done. > 10/21/09 10:22:52 INFO mapred.JobClient: map 100% reduce 0% > 10/21/09 10:22:58 INFO mapred.LocalJobRunner: > 10/21/09 10:22:59 INFO mapred.JobClient: map 99% reduce 0% > > > I'm convinced I'm not configuring hbase or hadoop correctly. Any suggestions? > > Mark Vigeant > RiskMetrics Group, Inc. >
