afaik mac minis have just 2 cores right? So 2 map tasks per machine + datanode + region server + ZK = 5 processes. From what I've seen the region server will eat at least 1 CPU while under import so that does not leave a lot of room for the rest. You could try with 1 map slot per machine and give HBase a heap of 2GB.
J-D On Wed, Jul 22, 2009 at 12:23 PM, tim robertson<[email protected]> wrote: > Strangely enough, it didn't help. I suspect I am just overloading the > machines - they only have 4G ram. > When I use a separate machine and a single thread is pushing in 1000 > inserts per second, but a MapReduce on the cluster is doing only 500 > (8 map tasks running on 4 nodes). > > > Cheers, > > Tim > > > On Wed, Jul 22, 2009 at 5:21 PM, tim robertson<[email protected]> > wrote: >> Below is a sample row (\N are ignored in the Map) so I will try the >> default of 2meg which should buffer a bunch before flushing >> >> Thanks for your tips, >> >> Tim >> >> 199798861 293 8107 8436 MNHNL Recorder database >> LUXNATFUND404573t Pilophorus cinnamopterus (KIRSCHBAUM,18 >> 56) \N \N \N \N \N \N \N \N >> \N \N 49.61 6.13 \N \N \N \N >> \N \N \N \N \N \N \N L. >> Reichling Parc (Luxembourg) 1979 7 10 \N \ >> N \N \N 2009-02-20 04:19:51 2009-02-20 08:40:21 >> \N 199798861 293 8107 29773 1519409 11922838 >> 1 21560621 9917520 \N \N \N \N \N >> \N \N \N \N 49.61 6.13 50226 61 >> 186 1979 7 1979-07-10 0 0 0 >> 2 \N \N \N \N >> >> >> On Wed, Jul 22, 2009 at 5:13 PM, Jean-Daniel Cryans<[email protected]> >> wrote: >>> It really depends on the size of each Put. If 1 put = 1MB, then a 2MB >>> buffer (the default) won't be useful. A 1GB buffer (what you wrote) >>> will likely OOME your client and, if not, your region servers will in >>> no time. >>> >>> So try with the default and then if it goes well you can try setting >>> it higher. Do you know the size of each row? >>> >>> J-D >>> >>> On Wed, Jul 22, 2009 at 11:04 AM, tim >>> robertson<[email protected]> wrote: >>>> Could you suggest a sensible write buffer size please? >>>> >>>> 1024x1024x1024 bytes? >>>> >>>> Cheers >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Jul 22, 2009 at 4:41 PM, tim robertson<[email protected]> >>>> wrote: >>>>> Thanks J-D >>>>> >>>>> I will try this now. >>>>> >>>>> On Wed, Jul 22, 2009 at 3:44 PM, Jean-Daniel Cryans<[email protected]> >>>>> wrote: >>>>>> Tim, >>>>>> >>>>>> Are you using the write buffer? See HTable.setAutoFlush and >>>>>> HTable.setWriteBufferSize if not. This will help a lot. >>>>>> >>>>>> Also since you have only 4 machines, try setting the HDFS replication >>>>>> factor lower than 3. >>>>>> >>>>>> J-D >>>>>> >>>>>> On Wed, Jul 22, 2009 at 8:26 AM, tim >>>>>> robertson<[email protected]> wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I have a 70G sparsely populated tab file (74 columns) to load into 2 >>>>>>> column families in a single HBase table. >>>>>>> >>>>>>> I am running on my tiny dev cluster (4 mac minis, 4G ram, each running >>>>>>> all Hadoop demons and RegionServers) to just familiarise myself, while >>>>>>> the proper rack is being set up. >>>>>>> >>>>>>> I wrote a MapReduce job where I load into HBase during the Map: >>>>>>> String rowID = UUID.randomUUID().toString(); >>>>>>> Put row = new Put(rowID.getBytes()); >>>>>>> int fields = reader.readAllInto(splits, row); // uses a properties >>>>>>> file to map tab columns to column families >>>>>>> context.setStatus("Map updating cell for row[" + rowID+ "] with " + >>>>>>> fields + " fields"); >>>>>>> table.put(row); >>>>>>> >>>>>>> Is this the preferred way to do this kind of loading or is a >>>>>>> TableOutputFormat likely to outperform the Map version? >>>>>>> >>>>>>> [Knowing performance estimates are pointless on this cluster - I see >>>>>>> 500 records per sec input, which is a bit disappointing. I have >>>>>>> default Hadoop and HBase config and had to put a ZK quorum on each to >>>>>>> get HBase to start] >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Tim >>>>>>> >>>>>> >>>>> >>>> >>> >> >
