RE: Table Upload Optimization

Mark Vigeant Wed, 21 Oct 2009 07:57:48 -0700

No. Should I?

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel 
Cryans
Sent: Wednesday, October 21, 2009 10:55 AM
To: [email protected]
Subject: Re: Table Upload Optimization


Are you using the Hadoop Streaming API?

J-D

On Wed, Oct 21, 2009 at 7:52 AM, Mark Vigeant
<[email protected]> wrote:
> Hey
>
> So I want to upload a lot of XML data into an HTable. I have a class that 
> successfully maps up to about 500 MB of data or so (on one regionserver) into 
> a table, but if I go for much bigger than that it takes forever and 
> eventually just stops. I tried uploading a big XML file into my 4 
> regionserver cluster (about 7 GB) and it's been a day and it's still going at 
> it.
>
> What I get when I run the job on the 4 node cluster is:
> 10/21/09 10:22:35 INFO mapred.LocalJobRunner:
> 10/21/09 10:22:38 INFO mapred.LocalJobRunner:
> (then it does that for a while until...)
> 10/21/09 10:22:52 INFO mapred.TaskRunner: Task attempt_local_0001_m_000117_0 
> is done. And is in the process of committing
> 10/21/09 10:22:52 INFO mapred.LocalJobRunner:
> 10/21/09 10:22:52 mapred.TaskRunner: Task 'attempt_local_0001_m_000117_0' is 
> done.
> 10/21/09 10:22:52 INFO mapred.JobClient:   map 100% reduce 0%
> 10/21/09 10:22:58 INFO mapred.LocalJobRunner:
> 10/21/09 10:22:59 INFO mapred.JobClient: map 99% reduce 0%
>
>
> I'm convinced I'm not configuring hbase or hadoop correctly. Any suggestions?
>
> Mark Vigeant
> RiskMetrics Group, Inc.
>

RE: Table Upload Optimization

Reply via email to