Hey

So I want to upload a lot of XML data into an HTable. I have a class that 
successfully maps up to about 500 MB of data or so (on one regionserver) into a 
table, but if I go for much bigger than that it takes forever and eventually 
just stops. I tried uploading a big XML file into my 4 regionserver cluster 
(about 7 GB) and it's been a day and it's still going at it.

What I get when I run the job on the 4 node cluster is:
10/21/09 10:22:35 INFO mapred.LocalJobRunner:
10/21/09 10:22:38 INFO mapred.LocalJobRunner:
(then it does that for a while until...)
10/21/09 10:22:52 INFO mapred.TaskRunner: Task attempt_local_0001_m_000117_0 is 
done. And is in the process of committing
10/21/09 10:22:52 INFO mapred.LocalJobRunner:
10/21/09 10:22:52 mapred.TaskRunner: Task 'attempt_local_0001_m_000117_0' is 
done.
10/21/09 10:22:52 INFO mapred.JobClient:   map 100% reduce 0%
10/21/09 10:22:58 INFO mapred.LocalJobRunner:
10/21/09 10:22:59 INFO mapred.JobClient: map 99% reduce 0%


I'm convinced I'm not configuring hbase or hadoop correctly. Any suggestions?

Mark Vigeant
RiskMetrics Group, Inc.

Reply via email to