On Wed, Feb 17, 2010 at 6:17 PM, Ryan Rawson <ryano...@gmail.com> wrote: > Why is LZO's license a problem? Sure it's GPL, but so is a lot of the > software on a linux system... The way the GPL is done, there is no > compile time dependency between HBase and the LZO libraries. Thus > there is no GPL pollution in the eyes of the ASF lawyer-likes. > > Running HBase as part of a service, internally to a company, the GPL > should not be a problem here. > > On Tue, Feb 16, 2010 at 10:50 AM, James Baldassari <ja...@dataxu.com> wrote: >> Today we added a fourth region server and forced the data to be >> redistributed evenly by exporting /hbase and then importing it back in >> (the Hadoop redistribution tool didn't even out the data completely). >> We also increased the max heap size on the region server from 4G to 5G >> and decreased the block cache value from 0.4 back to 0.2. These changes >> didn't seem to provide any real benefit, unfortunately. In a few >> minutes we're going to try deploying the patched (HBASE-2180) HBase >> 0.20.3. If all of these changes don't work we'll look at tweaking the >> block size as you suggested. >> >> LZO is not an option for us due to the GPL license on the required Java >> connector library (http://code.google.com/p/hadoop-gpl-compression/). >> Another here group was looking at using LZO in their Hadoop cluster but >> ruled it out due to the licensing issues. Is anyone aware of any LZO >> (or other codec) libraries/connectors released under the Apache or LGPL >> licenses that can be used with HBase/Hadoop? >> >> We do have Ganglia with monitoring and alerts. We're not swapping right >> now, although it is possible for that to happen at some point. What >> seems to be happening is that the load on all region servers will be >> fairly low (< 1.0) except for one region server which will have a load >> of around 10 with a high I/O wait. Then this region server will go back >> to normal and a different region server will experience high load and >> I/O wait. I'm hopeful that HBASE-2180 may resolve some of these issues. >> I'll send an update after we deploy the patched jar. >> >> -James >> >> >> On Tue, 2010-02-16 at 12:17 -0600, Stack wrote: >>> If you don't do lzo, and if your cell size is smallish, then try a >>> different block size. Default blocksize is 64k which might be ok for >>> a single-seek -- i.e. costs near same getting 64k as it does 4k -- but >>> for a random-read loading with lots of concurrency, it might make for >>> more work being done than needs be and so throughput drops. To enable >>> 4k blocks, do as Lars said for enabling lzo only change the block size >>> when table is offline. You could run a major compaction and it'll >>> rewrite all as 4k blocks promptly (at a large i/o cost) or just let >>> the cluster go about its business and as it compacts naturally, the >>> new files will be 4k. >>> >>> In an earlier note you say you are over-allocated and so you could be >>> swapping (Are you? Do you ops teams have ganglia or some such running >>> against this cluster?). A JVM whose heap is paging will not perform >>> at all. You don't even hit the max on all daemons for paging to >>> happen. See "Are you swapping" in this page >>> http://wiki.apache.org/hadoop/PerformanceTuning. >>> >>> St.Ack >>> >>> >>> On Mon, Feb 15, 2010 at 11:21 PM, James Baldassari <ja...@dataxu.com> wrote: >>> > No, we don't have LZO on the table right now. I guess that's something >>> > else that we can try. I'll ask our ops team if we can steal another >>> > node or two for the cluster if you think that will help. I'll report >>> > back with results as soon as I can. Thanks again for working with me on >>> > this! This is definitely the most responsive users list I've ever >>> > posted to. >>> > >>> > -James >>> > >>> > >>> > On Tue, 2010-02-16 at 01:11 -0600, Dan Washusen wrote: >>> >> You could just add another node to your cluster to solve the immediate >>> >> problem. Then keep an eye on load, etc to preemptively add more nodes as >>> >> needed? >>> >> >>> >> Out of interest do you have LZO compression enabled on your table? That >>> >> makes the block cache and IO ops much more effective... >>> >> >>> >> Regarding GC logging: >>> >> Also, another option for GC logging is 'jstat'. For example, running the >>> >> following command will print out the VM heap utilization every 1 second: >>> >> >>> >> > jstat -gcutil <pid> 1000 >>> >> > >>> >> >>> >> The last column shows total amount of time (in seconds) spent garbage >>> >> collecting. You want to see very small increments... The other >>> >> interesting columns are "O" and "E". They show the percentage of Old and >>> >> Eden used. If old gen is staying up in the high 90's then there are more >>> >> long lived objects then available memory... >>> >> >>> >> Cheers, >>> >> Dan >>> >> >>> >> On 16 February 2010 17:54, James Baldassari <ja...@dataxu.com> wrote: >>> >> >>> >> > How much should I give the region servers? That machine is already >>> >> > overallocated, by which I mean that the sum of the max heap sizes of >>> >> > all >>> >> > java processes running there is greater than the amount physical >>> >> > memory, >>> >> > which can lead to swapping. We have: Hadoop data node, Hadoop task >>> >> > tracker, ZooKeeper peer, and region server. The machine has 8G of >>> >> > physical memory. The region server currently has a max heap size of >>> >> > 4G. >>> >> > Should I increase to 6G? Should I decrease the block cache back down >>> >> > to >>> >> > 20% or even lower? Do we need to move to a 16G server? >>> >> > >>> >> > Thanks, >>> >> > James >>> >> > >>> >> > >>> >> > On Tue, 2010-02-16 at 00:48 -0600, Dan Washusen wrote: >>> >> > > 32% IO on region server 3! Ouch! :) >>> >> > > >>> >> > > Increasing the block cache to 40% of VM memory without upping the >>> >> > > total >>> >> > > available memory may only exacerbated the issue. I notice that >>> >> > > region >>> >> > > server 2 was already using 3300mb of the 4000mb heap. By increasing >>> >> > > the >>> >> > > block cache size to 40% you have now given the block cache 1600mb >>> >> > compared >>> >> > > to the previous 800mb... >>> >> > > >>> >> > > Can you give the region servers more memory? >>> >> > > >>> >> > > Cheers, >>> >> > > Dan >>> >> > > >>> >> > > On 16 February 2010 17:42, James Baldassari <ja...@dataxu.com> wrote: >>> >> > > >>> >> > > > On Tue, 2010-02-16 at 00:14 -0600, Stack wrote: >>> >> > > > > On Mon, Feb 15, 2010 at 10:05 PM, James Baldassari >>> >> > > > > <ja...@dataxu.com >>> >> > > >>> >> > > > wrote: >>> >> > > > > > Applying HBASE-2180 isn't really an option at this >>> >> > > > > > time because we've been told to stick with the Cloudera distro. >>> >> > > > > >>> >> > > > > I'm sure the wouldn't mind (smile). Seems to about double >>> >> > throughput. >>> >> > > > >>> >> > > > Hmm, well I might be able to convince them ;) >>> >> > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > > If I had to guess, I would say the performance issues start to >>> >> > happen >>> >> > > > > > around the time the region servers hit max heap size, which >>> >> > > > > > occurs >>> >> > > > > > within minutes of exposing the app to live traffic. Could GC >>> >> > > > > > be >>> >> > > > killing >>> >> > > > > > us? We use the concurrent collector as suggested. I saw on >>> >> > > > > > the >>> >> > > > > > performance page some mention of limiting the size of the new >>> >> > > > generation >>> >> > > > > > like -XX:NewSize=6m -XX:MaxNewSize=6m. Is that worth trying? >>> >> > > > > >>> >> > > > > Enable GC logging for a while? See hbase-env.sh. Uncomment this >>> >> > line: >>> >> > > > > >>> >> > > > > # export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails >>> >> > > > > XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log" >>> >> > > > >>> >> > > > I did uncomment that line, but I can't figure out where the >>> >> > gc-hbase.log >>> >> > > > is. It's not with the other logs. When starting HBase the GC >>> >> > > > output >>> >> > > > seems to be going to stdout rather than the file. Maybe a Cloudera >>> >> > > > thing. I'll do some digging. >>> >> > > > >>> >> > > > > >>> >> > > > > You are using recent JVM? 1.6.0_10 or greater? 1.6.0_18 might >>> >> > > > > have >>> >> > > > issues. >>> >> > > > >>> >> > > > We're on 1.6.0_16 at the moment. >>> >> > > > >>> >> > > > > >>> >> > > > > Whats CPU and iowait or wa in top look like on these machines, >>> >> > > > > particularly the loaded machine? >>> >> > > > > >>> >> > > > > How many disks in the machines? >>> >> > > > >>> >> > > > I'll have to ask our ops guys about the disks. The high load has >>> >> > > > now >>> >> > > > switched from region server 1 to 3. I just saw in our logs that it >>> >> > took >>> >> > > > 139383.065 milliseconds to do 5000 gets, ~36 gets/second, ouch. >>> >> > > > Here >>> >> > > > are the highlights from top for each region server: >>> >> > > > >>> >> > > > Region Server 1: >>> >> > > > top - 01:39:41 up 4 days, 13:44, 4 users, load average: 1.89, >>> >> > > > 0.99, >>> >> > 1.19 >>> >> > > > Tasks: 194 total, 1 running, 193 sleeping, 0 stopped, 0 >>> >> > > > zombie >>> >> > > > Cpu(s): 15.6%us, 5.8%sy, 0.0%ni, 76.9%id, 0.0%wa, 0.1%hi, >>> >> > > > 1.6%si, >>> >> > > > 0.0%st >>> >> > > > Mem: 8166588k total, 8112812k used, 53776k free, 8832k >>> >> > buffers >>> >> > > > Swap: 1052248k total, 152k used, 1052096k free, 2831076k >>> >> > > > cached >>> >> > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> >> > > > 21961 hadoop 19 0 4830m 4.2g 10m S 114.3 53.6 37:26.58 java >>> >> > > > 21618 hadoop 21 0 4643m 578m 9804 S 66.1 7.3 19:06.89 java >>> >> > > > >>> >> > > > Region Server 2: >>> >> > > > top - 01:40:28 up 4 days, 13:43, 4 users, load average: 3.93, >>> >> > > > 2.17, >>> >> > 1.39 >>> >> > > > Tasks: 194 total, 1 running, 193 sleeping, 0 stopped, 0 >>> >> > > > zombie >>> >> > > > Cpu(s): 11.3%us, 3.1%sy, 0.0%ni, 83.4%id, 1.2%wa, 0.1%hi, >>> >> > > > 0.9%si, >>> >> > > > 0.0%st >>> >> > > > Mem: 8166588k total, 7971572k used, 195016k free, 34972k >>> >> > buffers >>> >> > > > Swap: 1052248k total, 152k used, 1052096k free, 2944712k >>> >> > > > cached >>> >> > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> >> > > > 15752 hadoop 18 0 4742m 4.1g 10m S 210.6 53.1 41:52.80 java >>> >> > > > 15405 hadoop 20 0 4660m 317m 9800 S 114.0 4.0 27:34.17 java >>> >> > > > >>> >> > > > Region Server 3: >>> >> > > > top - 01:40:35 up 2 days, 9:04, 4 users, load average: 10.15, >>> >> > > > 11.05, >>> >> > > > 11.79 >>> >> > > > Tasks: 195 total, 1 running, 194 sleeping, 0 stopped, 0 >>> >> > > > zombie >>> >> > > > Cpu(s): 28.7%us, 10.1%sy, 0.0%ni, 25.8%id, 32.9%wa, 0.1%hi, >>> >> > > > 2.4%si, >>> >> > > > 0.0%st >>> >> > > > Mem: 8166572k total, 8118592k used, 47980k free, 3264k >>> >> > buffers >>> >> > > > Swap: 1052248k total, 140k used, 1052108k free, 2099896k >>> >> > > > cached >>> >> > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> >> > > > 15636 hadoop 18 0 4806m 4.2g 10m S 206.9 53.3 87:48.81 java >>> >> > > > 15243 hadoop 18 0 4734m 1.3g 9800 S 117.6 16.7 63:46.52 java >>> >> > > > >>> >> > > > -James >>> >> > > > >>> >> > > > > >>> >> > > > > St>Ack >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > > >>> >> > > > > > Here are the new region server stats along with load averages: >>> >> > > > > > >>> >> > > > > > Region Server 1: >>> >> > > > > > request=0.0, regions=16, stores=16, storefiles=33, >>> >> > > > storefileIndexSize=4, memstoreSize=1, compactionQueueSize=0, >>> >> > usedHeap=2891, >>> >> > > > maxHeap=4079, blockCacheSize=1403878072, blockCacheFree=307135816, >>> >> > > > blockCacheCount=21107, blockCacheHitRatio=84, fsReadLatency=0, >>> >> > > > fsWriteLatency=0, fsSyncLatency=0 >>> >> > > > > > Load Averages: 10.34, 10.58, 7.08 >>> >> > > > > > >>> >> > > > > > Region Server 2: >>> >> > > > > > request=0.0, regions=15, stores=16, storefiles=26, >>> >> > > > storefileIndexSize=3, memstoreSize=1, compactionQueueSize=0, >>> >> > usedHeap=3257, >>> >> > > > maxHeap=4079, blockCacheSize=661765368, blockCacheFree=193741576, >>> >> > > > blockCacheCount=9942, blockCacheHitRatio=77, fsReadLatency=0, >>> >> > > > fsWriteLatency=0, fsSyncLatency=0 >>> >> > > > > > Load Averages: 1.90, 1.23, 0.98 >>> >> > > > > > >>> >> > > > > > Region Server 3: >>> >> > > > > > request=0.0, regions=16, stores=16, storefiles=41, >>> >> > > > storefileIndexSize=4, memstoreSize=4, compactionQueueSize=0, >>> >> > usedHeap=1627, >>> >> > > > maxHeap=4079, blockCacheSize=665117184, blockCacheFree=190389760, >>> >> > > > blockCacheCount=9995, blockCacheHitRatio=70, fsReadLatency=0, >>> >> > > > fsWriteLatency=0, fsSyncLatency=0 >>> >> > > > > > Load Averages: 2.01, 3.56, 4.18 >>> >> > > > > > >>> >> > > > > > That first region server is getting hit much harder than the >>> >> > others. >>> >> > > > > > They're identical machines (8-core), and the distribution of >>> >> > > > > > keys >>> >> > > > should >>> >> > > > > > be fairly random, so I'm not sure why that would happen. Any >>> >> > > > > > other >>> >> > > > > > ideas or suggestions would be greatly appreciated. >>> >> > > > > > >>> >> > > > > > Thanks, >>> >> > > > > > James >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > On Mon, 2010-02-15 at 21:51 -0600, Stack wrote: >>> >> > > > > >> Yeah, I was going to say that if your loading is mostly read, >>> >> > > > > >> you >>> >> > can >>> >> > > > > >> probably go up from the 0.2 given over to cache. I like Dan's >>> >> > > > > >> suggestion of trying it first on one server, if you can. >>> >> > > > > >> >>> >> > > > > >> St.Ack >>> >> > > > > >> >>> >> > > > > >> On Mon, Feb 15, 2010 at 5:22 PM, Dan Washusen >>> >> > > > > >> <d...@reactive.org> >>> >> > > > wrote: >>> >> > > > > >> > So roughly 72% of reads use the blocks held in the block >>> >> > cache... >>> >> > > > > >> > >>> >> > > > > >> > It would be interesting to see the difference between when >>> >> > > > > >> > it >>> >> > was >>> >> > > > working OK >>> >> > > > > >> > and now. Could you try increasing the memory allocated to >>> >> > > > > >> > one >>> >> > of >>> >> > > > the >>> >> > > > > >> > regions and also increasing the "hfile.block.cache.size" to >>> >> > > > > >> > say >>> >> > > > '0.4' on the >>> >> > > > > >> > same region? >>> >> > > > > >> > >>> >> > > > > >> > On 16 February 2010 11:54, James Baldassari >>> >> > > > > >> > <ja...@dataxu.com> >>> >> > > > wrote: >>> >> > > > > >> > >>> >> > > > > >> >> Hi Dan. Thanks for your suggestions. I am doing writes >>> >> > > > > >> >> at the >>> >> > > > same >>> >> > > > > >> >> time as reads, but there are usually many more reads than >>> >> > writes. >>> >> > > > Here >>> >> > > > > >> >> are the stats for all three region servers: >>> >> > > > > >> >> >>> >> > > > > >> >> Region Server 1: >>> >> > > > > >> >> request=0.0, regions=15, stores=16, storefiles=34, >>> >> > > > storefileIndexSize=3, >>> >> > > > > >> >> memstoreSize=308, compactionQueueSize=0, usedHeap=3096, >>> >> > > > maxHeap=4079, >>> >> > > > > >> >> blockCacheSize=705474544, blockCacheFree=150032400, >>> >> > > > blockCacheCount=10606, >>> >> > > > > >> >> blockCacheHitRatio=76, fsReadLatency=0, fsWriteLatency=0, >>> >> > > > fsSyncLatency=0 >>> >> > > > > >> >> >>> >> > > > > >> >> Region Server 2: >>> >> > > > > >> >> request=0.0, regions=16, stores=16, storefiles=39, >>> >> > > > storefileIndexSize=4, >>> >> > > > > >> >> memstoreSize=225, compactionQueueSize=0, usedHeap=3380, >>> >> > > > maxHeap=4079, >>> >> > > > > >> >> blockCacheSize=643172800, blockCacheFree=212334144, >>> >> > > > blockCacheCount=9660, >>> >> > > > > >> >> blockCacheHitRatio=69, fsReadLatency=0, fsWriteLatency=0, >>> >> > > > fsSyncLatency=0 >>> >> > > > > >> >> >>> >> > > > > >> >> Region Server 3: >>> >> > > > > >> >> request=0.0, regions=13, stores=13, storefiles=31, >>> >> > > > storefileIndexSize=4, >>> >> > > > > >> >> memstoreSize=177, compactionQueueSize=0, usedHeap=1905, >>> >> > > > maxHeap=4079, >>> >> > > > > >> >> blockCacheSize=682848608, blockCacheFree=172658336, >>> >> > > > blockCacheCount=10262, >>> >> > > > > >> >> blockCacheHitRatio=72, fsReadLatency=0, fsWriteLatency=0, >>> >> > > > fsSyncLatency=0 >>> >> > > > > >> >> >>> >> > > > > >> >> The average blockCacheHitRatio is about 72. Is this too >>> >> > > > > >> >> low? >>> >> > > > Anything >>> >> > > > > >> >> else I can check? >>> >> > > > > >> >> >>> >> > > > > >> >> -James >>> >> > > > > >> >> >>> >> > > > > >> >> >>> >> > > > > >> >> On Mon, 2010-02-15 at 18:16 -0600, Dan Washusen wrote: >>> >> > > > > >> >> > Maybe the block cache is thrashing? >>> >> > > > > >> >> > >>> >> > > > > >> >> > If you are regularly writing data to your tables then >>> >> > > > > >> >> > it's >>> >> > > > possible that >>> >> > > > > >> >> the >>> >> > > > > >> >> > block cache is no longer being effective. On the region >>> >> > server >>> >> > > > web UI >>> >> > > > > >> >> check >>> >> > > > > >> >> > the blockCacheHitRatio value. You want this value to be >>> >> > > > > >> >> > high >>> >> > (0 >>> >> > > > - 100). >>> >> > > > > >> >> If >>> >> > > > > >> >> > this value is low it means that HBase has to go to disk >>> >> > > > > >> >> > to >>> >> > fetch >>> >> > > > blocks >>> >> > > > > >> >> of >>> >> > > > > >> >> > data. You can control the amount of VM memory that HBase >>> >> > > > allocates to >>> >> > > > > >> >> the >>> >> > > > > >> >> > block cache using the "hfile.block.cache.size" property >>> >> > (default >>> >> > > > is 0.2 >>> >> > > > > >> >> > (20%)). >>> >> > > > > >> >> > >>> >> > > > > >> >> > Cheers, >>> >> > > > > >> >> > Dan >>> >> > > > > >> >> > >>> >> > > > > >> >> > On 16 February 2010 10:45, James Baldassari < >>> >> > ja...@dataxu.com> >>> >> > > > wrote: >>> >> > > > > >> >> > >>> >> > > > > >> >> > > Hi, >>> >> > > > > >> >> > > >>> >> > > > > >> >> > > Does anyone have any tips to share regarding >>> >> > > > > >> >> > > optimization >>> >> > for >>> >> > > > random >>> >> > > > > >> >> > > read performance? For writes I've found that setting a >>> >> > large >>> >> > > > write >>> >> > > > > >> >> > > buffer and setting auto-flush to false on the client >>> >> > > > > >> >> > > side >>> >> > > > significantly >>> >> > > > > >> >> > > improved put performance. Are there any similar easy >>> >> > tweaks to >>> >> > > > improve >>> >> > > > > >> >> > > random read performance? >>> >> > > > > >> >> > > >>> >> > > > > >> >> > > I'm using HBase 0.20.3 in a very read-heavy real-time >>> >> > system >>> >> > > > with 1 >>> >> > > > > >> >> > > master and 3 region servers. It was working ok for a >>> >> > while, >>> >> > > > but today >>> >> > > > > >> >> > > there was a severe degradation in read performance. >>> >> > Restarting >>> >> > > > Hadoop >>> >> > > > > >> >> > > and HBase didn't help, are there are no errors in the >>> >> > > > > >> >> > > logs. >>> >> > > > Read >>> >> > > > > >> >> > > performance starts off around 1,000-2,000 gets/second >>> >> > > > > >> >> > > but >>> >> > > > quickly >>> >> > > > > >> >> > > (within minutes) drops to around 100 gets/second. >>> >> > > > > >> >> > > >>> >> > > > > >> >> > > I've already looked at the performance tuning wiki >>> >> > > > > >> >> > > page. >>> >> > On >>> >> > > > the server >>> >> > > > > >> >> > > side I've increased hbase.regionserver.handler.count >>> >> > > > > >> >> > > from >>> >> > 10 to >>> >> > > > 100, >>> >> > > > > >> >> but >>> >> > > > > >> >> > > it didn't help. Maybe this is expected because I'm >>> >> > > > > >> >> > > only >>> >> > using >>> >> > > > a single >>> >> > > > > >> >> > > client to do reads. I'm working on implementing a >>> >> > > > > >> >> > > client >>> >> > pool >>> >> > > > now, but >>> >> > > > > >> >> > > I'm wondering if there are any other settings on the >>> >> > > > > >> >> > > server >>> >> > or >>> >> > > > client >>> >> > > > > >> >> > > side that might improve things. >>> >> > > > > >> >> > > >>> >> > > > > >> >> > > Thanks, >>> >> > > > > >> >> > > James >>> >> > > > > >> >> > > >>> >> > > > > >> >> > > >>> >> > > > > >> >> > > >>> >> > > > > >> >> >>> >> > > > > >> >> >>> >> > > > > >> > >>> >> > > > > > >>> >> > > > > > >>> >> > > > >>> >> > > > >>> >> > >>> >> > >>> > >>> > >> >> >
I do not think it is a problem anymore with the advent of: http://code.google.com/p/hadoop-gpl-compression/