Re: Optimizations for random read performance

Edward Capriolo Wed, 17 Feb 2010 15:23:42 -0800

On Wed, Feb 17, 2010 at 6:17 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> Why is LZO's license a problem?  Sure it's GPL, but so is a lot of the
> software on a linux system... The way the GPL is done, there is no
> compile time dependency between HBase and the LZO libraries.  Thus
> there is no GPL pollution in the eyes of the ASF lawyer-likes.
>
> Running HBase as part of a service, internally to a company, the GPL
> should not be a problem here.
>
> On Tue, Feb 16, 2010 at 10:50 AM, James Baldassari <ja...@dataxu.com> wrote:
>> Today we added a fourth region server and forced the data to be
>> redistributed evenly by exporting /hbase and then importing it back in
>> (the Hadoop redistribution tool didn't even out the data completely).
>> We also increased the max heap size on the region server from 4G to 5G
>> and decreased the block cache value from 0.4 back to 0.2.  These changes
>> didn't seem to provide any real benefit, unfortunately.  In a few
>> minutes we're going to try deploying the patched (HBASE-2180) HBase
>> 0.20.3.  If all of these changes don't work we'll look at tweaking the
>> block size as you suggested.
>>
>> LZO is not an option for us due to the GPL license on the required Java
>> connector library (http://code.google.com/p/hadoop-gpl-compression/).
>> Another here group was looking at using LZO in their Hadoop cluster but
>> ruled it out due to the licensing issues.  Is anyone aware of any LZO
>> (or other codec) libraries/connectors released under the Apache or LGPL
>> licenses that can be used with HBase/Hadoop?
>>
>> We do have Ganglia with monitoring and alerts.  We're not swapping right
>> now, although it is possible for that to happen at some point.  What
>> seems to be happening is that the load on all region servers will be
>> fairly low (< 1.0) except for one region server which will have a load
>> of around 10 with a high I/O wait.  Then this region server will go back
>> to normal and a different region server will experience high load and
>> I/O wait.  I'm hopeful that HBASE-2180 may resolve some of these issues.
>> I'll send an update after we deploy the patched jar.
>>
>> -James
>>
>>
>> On Tue, 2010-02-16 at 12:17 -0600, Stack wrote:
>>> If you don't do lzo, and if your cell size is smallish, then try a
>>> different block size.  Default blocksize is 64k which might be ok for
>>> a single-seek -- i.e. costs near same getting 64k as it does 4k -- but
>>> for a random-read loading with lots of concurrency, it might make for
>>> more work being done than needs be and so throughput drops.  To enable
>>> 4k blocks, do as Lars said for enabling lzo only change the block size
>>> when table is offline.   You could run a major compaction and it'll
>>> rewrite all as 4k blocks promptly (at a large i/o cost) or just let
>>> the cluster go about its business and as it compacts naturally, the
>>> new files will be 4k.
>>>
>>> In an earlier note you say you are over-allocated and so you could be
>>> swapping (Are you?  Do you ops teams have ganglia or some such running
>>> against this cluster?).   A JVM whose heap is paging will not perform
>>> at all.  You don't even hit the max on all daemons for paging to
>>> happen.  See "Are you swapping" in this page
>>> http://wiki.apache.org/hadoop/PerformanceTuning.
>>>
>>> St.Ack
>>>
>>>
>>> On Mon, Feb 15, 2010 at 11:21 PM, James Baldassari <ja...@dataxu.com> wrote:
>>> > No, we don't have LZO on the table right now.  I guess that's something
>>> > else that we can try.  I'll ask our ops team if we can steal another
>>> > node or two for the cluster if you think that will help.  I'll report
>>> > back with results as soon as I can.  Thanks again for working with me on
>>> > this!  This is definitely the most responsive users list I've ever
>>> > posted to.
>>> >
>>> > -James
>>> >
>>> >
>>> > On Tue, 2010-02-16 at 01:11 -0600, Dan Washusen wrote:
>>> >> You could just add another node to your cluster to solve the immediate
>>> >> problem.  Then keep an eye on load, etc to preemptively add more nodes as
>>> >> needed?
>>> >>
>>> >> Out of interest do you have LZO compression enabled on your table?  That
>>> >> makes the block cache and IO ops much more effective...
>>> >>
>>> >> Regarding GC logging:
>>> >> Also, another option for GC logging is 'jstat'.  For example, running the
>>> >> following command will print out the VM heap utilization every 1 second:
>>> >>
>>> >> > jstat -gcutil <pid> 1000
>>> >> >
>>> >>
>>> >> The last column shows total amount of time (in seconds) spent garbage
>>> >> collecting.   You want to see very small increments...  The other
>>> >> interesting columns are "O" and "E".  They show the percentage of Old and
>>> >> Eden used.  If old gen is staying up in the high 90's then there are more
>>> >> long lived objects then available memory...
>>> >>
>>> >> Cheers,
>>> >> Dan
>>> >>
>>> >> On 16 February 2010 17:54, James Baldassari <ja...@dataxu.com> wrote:
>>> >>
>>> >> > How much should I give the region servers?  That machine is already
>>> >> > overallocated, by which I mean that the sum of the max heap sizes of 
>>> >> > all
>>> >> > java processes running there is greater than the amount physical 
>>> >> > memory,
>>> >> > which can lead to swapping.  We have: Hadoop data node, Hadoop task
>>> >> > tracker, ZooKeeper peer, and region server.  The machine has 8G of
>>> >> > physical memory.  The region server currently has a max heap size of 
>>> >> > 4G.
>>> >> > Should I increase to 6G?  Should I decrease the block cache back down 
>>> >> > to
>>> >> > 20% or even lower?  Do we need to move to a 16G server?
>>> >> >
>>> >> > Thanks,
>>> >> > James
>>> >> >
>>> >> >
>>> >> > On Tue, 2010-02-16 at 00:48 -0600, Dan Washusen wrote:
>>> >> > > 32% IO on region server 3!  Ouch! :)
>>> >> > >
>>> >> > > Increasing the block cache to 40% of VM memory without upping the 
>>> >> > > total
>>> >> > > available memory may only exacerbated the issue.  I notice that 
>>> >> > > region
>>> >> > > server 2 was already using 3300mb of the 4000mb heap. By increasing 
>>> >> > > the
>>> >> > > block cache size to 40% you have now given the block cache 1600mb
>>> >> > compared
>>> >> > > to the previous 800mb...
>>> >> > >
>>> >> > > Can you give the region servers more memory?
>>> >> > >
>>> >> > > Cheers,
>>> >> > > Dan
>>> >> > >
>>> >> > > On 16 February 2010 17:42, James Baldassari <ja...@dataxu.com> wrote:
>>> >> > >
>>> >> > > > On Tue, 2010-02-16 at 00:14 -0600, Stack wrote:
>>> >> > > > > On Mon, Feb 15, 2010 at 10:05 PM, James Baldassari 
>>> >> > > > > <ja...@dataxu.com
>>> >> > >
>>> >> > > > wrote:
>>> >> > > > > >  Applying HBASE-2180 isn't really an option at this
>>> >> > > > > > time because we've been told to stick with the Cloudera distro.
>>> >> > > > >
>>> >> > > > > I'm sure the wouldn't mind (smile).  Seems to about double
>>> >> > throughput.
>>> >> > > >
>>> >> > > > Hmm, well I might be able to convince them ;)
>>> >> > > >
>>> >> > > > >
>>> >> > > > >
>>> >> > > > > > If I had to guess, I would say the performance issues start to
>>> >> > happen
>>> >> > > > > > around the time the region servers hit max heap size, which 
>>> >> > > > > > occurs
>>> >> > > > > > within minutes of exposing the app to live traffic.  Could GC 
>>> >> > > > > > be
>>> >> > > > killing
>>> >> > > > > > us?  We use the concurrent collector as suggested.  I saw on 
>>> >> > > > > > the
>>> >> > > > > > performance page some mention of limiting the size of the new
>>> >> > > > generation
>>> >> > > > > > like -XX:NewSize=6m -XX:MaxNewSize=6m.  Is that worth trying?
>>> >> > > > >
>>> >> > > > > Enable GC logging for a while?  See hbase-env.sh.  Uncomment this
>>> >> > line:
>>> >> > > > >
>>> >> > > > > # export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
>>> >> > > > > XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
>>> >> > > >
>>> >> > > > I did uncomment that line, but I can't figure out where the
>>> >> > gc-hbase.log
>>> >> > > > is.  It's not with the other logs.  When starting HBase the GC 
>>> >> > > > output
>>> >> > > > seems to be going to stdout rather than the file.  Maybe a Cloudera
>>> >> > > > thing.  I'll do some digging.
>>> >> > > >
>>> >> > > > >
>>> >> > > > > You are using recent JVM?  1.6.0_10 or greater?  1.6.0_18 might 
>>> >> > > > > have
>>> >> > > > issues.
>>> >> > > >
>>> >> > > > We're on 1.6.0_16 at the moment.
>>> >> > > >
>>> >> > > > >
>>> >> > > > > Whats CPU and iowait or wa in top look like on these machines,
>>> >> > > > > particularly the loaded machine?
>>> >> > > > >
>>> >> > > > > How many disks in the machines?
>>> >> > > >
>>> >> > > > I'll have to ask our ops guys about the disks.  The high load has 
>>> >> > > > now
>>> >> > > > switched from region server 1 to 3.  I just saw in our logs that it
>>> >> > took
>>> >> > > > 139383.065 milliseconds to do 5000 gets, ~36 gets/second, ouch.  
>>> >> > > > Here
>>> >> > > > are the highlights from top for each region server:
>>> >> > > >
>>> >> > > > Region Server 1:
>>> >> > > > top - 01:39:41 up 4 days, 13:44,  4 users,  load average: 1.89, 
>>> >> > > > 0.99,
>>> >> > 1.19
>>> >> > > > Tasks: 194 total,   1 running, 193 sleeping,   0 stopped,   0 
>>> >> > > > zombie
>>> >> > > > Cpu(s): 15.6%us,  5.8%sy,  0.0%ni, 76.9%id,  0.0%wa,  0.1%hi,  
>>> >> > > > 1.6%si,
>>> >> > > >  0.0%st
>>> >> > > > Mem:   8166588k total,  8112812k used,    53776k free,     8832k
>>> >> > buffers
>>> >> > > > Swap:  1052248k total,      152k used,  1052096k free,  2831076k 
>>> >> > > > cached
>>> >> > > >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> >> > > > 21961 hadoop    19   0 4830m 4.2g  10m S 114.3 53.6  37:26.58 java
>>> >> > > > 21618 hadoop    21   0 4643m 578m 9804 S 66.1  7.3  19:06.89 java
>>> >> > > >
>>> >> > > > Region Server 2:
>>> >> > > > top - 01:40:28 up 4 days, 13:43,  4 users,  load average: 3.93, 
>>> >> > > > 2.17,
>>> >> > 1.39
>>> >> > > > Tasks: 194 total,   1 running, 193 sleeping,   0 stopped,   0 
>>> >> > > > zombie
>>> >> > > > Cpu(s): 11.3%us,  3.1%sy,  0.0%ni, 83.4%id,  1.2%wa,  0.1%hi,  
>>> >> > > > 0.9%si,
>>> >> > > >  0.0%st
>>> >> > > > Mem:   8166588k total,  7971572k used,   195016k free,    34972k
>>> >> > buffers
>>> >> > > > Swap:  1052248k total,      152k used,  1052096k free,  2944712k 
>>> >> > > > cached
>>> >> > > >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> >> > > > 15752 hadoop    18   0 4742m 4.1g  10m S 210.6 53.1  41:52.80 java
>>> >> > > > 15405 hadoop    20   0 4660m 317m 9800 S 114.0  4.0  27:34.17 java
>>> >> > > >
>>> >> > > > Region Server 3:
>>> >> > > > top - 01:40:35 up 2 days,  9:04,  4 users,  load average: 10.15, 
>>> >> > > > 11.05,
>>> >> > > > 11.79
>>> >> > > > Tasks: 195 total,   1 running, 194 sleeping,   0 stopped,   0 
>>> >> > > > zombie
>>> >> > > > Cpu(s): 28.7%us, 10.1%sy,  0.0%ni, 25.8%id, 32.9%wa,  0.1%hi,  
>>> >> > > > 2.4%si,
>>> >> > > >  0.0%st
>>> >> > > > Mem:   8166572k total,  8118592k used,    47980k free,     3264k
>>> >> > buffers
>>> >> > > > Swap:  1052248k total,      140k used,  1052108k free,  2099896k 
>>> >> > > > cached
>>> >> > > >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> >> > > > 15636 hadoop    18   0 4806m 4.2g  10m S 206.9 53.3  87:48.81 java
>>> >> > > > 15243 hadoop    18   0 4734m 1.3g 9800 S 117.6 16.7  63:46.52 java
>>> >> > > >
>>> >> > > > -James
>>> >> > > >
>>> >> > > > >
>>> >> > > > > St>Ack
>>> >> > > > >
>>> >> > > > >
>>> >> > > > >
>>> >> > > > > >
>>> >> > > > > > Here are the new region server stats along with load averages:
>>> >> > > > > >
>>> >> > > > > > Region Server 1:
>>> >> > > > > > request=0.0, regions=16, stores=16, storefiles=33,
>>> >> > > > storefileIndexSize=4, memstoreSize=1, compactionQueueSize=0,
>>> >> > usedHeap=2891,
>>> >> > > > maxHeap=4079, blockCacheSize=1403878072, blockCacheFree=307135816,
>>> >> > > > blockCacheCount=21107, blockCacheHitRatio=84, fsReadLatency=0,
>>> >> > > > fsWriteLatency=0, fsSyncLatency=0
>>> >> > > > > > Load Averages: 10.34, 10.58, 7.08
>>> >> > > > > >
>>> >> > > > > > Region Server 2:
>>> >> > > > > > request=0.0, regions=15, stores=16, storefiles=26,
>>> >> > > > storefileIndexSize=3, memstoreSize=1, compactionQueueSize=0,
>>> >> > usedHeap=3257,
>>> >> > > > maxHeap=4079, blockCacheSize=661765368, blockCacheFree=193741576,
>>> >> > > > blockCacheCount=9942, blockCacheHitRatio=77, fsReadLatency=0,
>>> >> > > > fsWriteLatency=0, fsSyncLatency=0
>>> >> > > > > > Load Averages: 1.90, 1.23, 0.98
>>> >> > > > > >
>>> >> > > > > > Region Server 3:
>>> >> > > > > > request=0.0, regions=16, stores=16, storefiles=41,
>>> >> > > > storefileIndexSize=4, memstoreSize=4, compactionQueueSize=0,
>>> >> > usedHeap=1627,
>>> >> > > > maxHeap=4079, blockCacheSize=665117184, blockCacheFree=190389760,
>>> >> > > > blockCacheCount=9995, blockCacheHitRatio=70, fsReadLatency=0,
>>> >> > > > fsWriteLatency=0, fsSyncLatency=0
>>> >> > > > > > Load Averages: 2.01, 3.56, 4.18
>>> >> > > > > >
>>> >> > > > > > That first region server is getting hit much harder than the
>>> >> > others.
>>> >> > > > > > They're identical machines (8-core), and the distribution of 
>>> >> > > > > > keys
>>> >> > > > should
>>> >> > > > > > be fairly random, so I'm not sure why that would happen.  Any 
>>> >> > > > > > other
>>> >> > > > > > ideas or suggestions would be greatly appreciated.
>>> >> > > > > >
>>> >> > > > > > Thanks,
>>> >> > > > > > James
>>> >> > > > > >
>>> >> > > > > >
>>> >> > > > > > On Mon, 2010-02-15 at 21:51 -0600, Stack wrote:
>>> >> > > > > >> Yeah, I was going to say that if your loading is mostly read, 
>>> >> > > > > >> you
>>> >> > can
>>> >> > > > > >> probably go up from the 0.2 given over to cache.  I like Dan's
>>> >> > > > > >> suggestion of trying it first on one server, if you can.
>>> >> > > > > >>
>>> >> > > > > >> St.Ack
>>> >> > > > > >>
>>> >> > > > > >> On Mon, Feb 15, 2010 at 5:22 PM, Dan Washusen 
>>> >> > > > > >> <d...@reactive.org>
>>> >> > > > wrote:
>>> >> > > > > >> > So roughly 72% of reads use the blocks held in the block
>>> >> > cache...
>>> >> > > > > >> >
>>> >> > > > > >> > It would be interesting to see the difference between when 
>>> >> > > > > >> > it
>>> >> > was
>>> >> > > > working OK
>>> >> > > > > >> > and now.  Could you try increasing the memory allocated to 
>>> >> > > > > >> > one
>>> >> > of
>>> >> > > > the
>>> >> > > > > >> > regions and also increasing the "hfile.block.cache.size" to 
>>> >> > > > > >> > say
>>> >> > > > '0.4' on the
>>> >> > > > > >> > same region?
>>> >> > > > > >> >
>>> >> > > > > >> > On 16 February 2010 11:54, James Baldassari 
>>> >> > > > > >> > <ja...@dataxu.com>
>>> >> > > > wrote:
>>> >> > > > > >> >
>>> >> > > > > >> >> Hi Dan.  Thanks for your suggestions.  I am doing writes 
>>> >> > > > > >> >> at the
>>> >> > > > same
>>> >> > > > > >> >> time as reads, but there are usually many more reads than
>>> >> > writes.
>>> >> > > >  Here
>>> >> > > > > >> >> are the stats for all three region servers:
>>> >> > > > > >> >>
>>> >> > > > > >> >> Region Server 1:
>>> >> > > > > >> >> request=0.0, regions=15, stores=16, storefiles=34,
>>> >> > > > storefileIndexSize=3,
>>> >> > > > > >> >> memstoreSize=308, compactionQueueSize=0, usedHeap=3096,
>>> >> > > > maxHeap=4079,
>>> >> > > > > >> >> blockCacheSize=705474544, blockCacheFree=150032400,
>>> >> > > > blockCacheCount=10606,
>>> >> > > > > >> >> blockCacheHitRatio=76, fsReadLatency=0, fsWriteLatency=0,
>>> >> > > > fsSyncLatency=0
>>> >> > > > > >> >>
>>> >> > > > > >> >> Region Server 2:
>>> >> > > > > >> >> request=0.0, regions=16, stores=16, storefiles=39,
>>> >> > > > storefileIndexSize=4,
>>> >> > > > > >> >> memstoreSize=225, compactionQueueSize=0, usedHeap=3380,
>>> >> > > > maxHeap=4079,
>>> >> > > > > >> >> blockCacheSize=643172800, blockCacheFree=212334144,
>>> >> > > > blockCacheCount=9660,
>>> >> > > > > >> >> blockCacheHitRatio=69, fsReadLatency=0, fsWriteLatency=0,
>>> >> > > > fsSyncLatency=0
>>> >> > > > > >> >>
>>> >> > > > > >> >> Region Server 3:
>>> >> > > > > >> >> request=0.0, regions=13, stores=13, storefiles=31,
>>> >> > > > storefileIndexSize=4,
>>> >> > > > > >> >> memstoreSize=177, compactionQueueSize=0, usedHeap=1905,
>>> >> > > > maxHeap=4079,
>>> >> > > > > >> >> blockCacheSize=682848608, blockCacheFree=172658336,
>>> >> > > > blockCacheCount=10262,
>>> >> > > > > >> >> blockCacheHitRatio=72, fsReadLatency=0, fsWriteLatency=0,
>>> >> > > > fsSyncLatency=0
>>> >> > > > > >> >>
>>> >> > > > > >> >> The average blockCacheHitRatio is about 72.  Is this too 
>>> >> > > > > >> >> low?
>>> >> > > >  Anything
>>> >> > > > > >> >> else I can check?
>>> >> > > > > >> >>
>>> >> > > > > >> >> -James
>>> >> > > > > >> >>
>>> >> > > > > >> >>
>>> >> > > > > >> >> On Mon, 2010-02-15 at 18:16 -0600, Dan Washusen wrote:
>>> >> > > > > >> >> > Maybe the block cache is thrashing?
>>> >> > > > > >> >> >
>>> >> > > > > >> >> > If you are regularly writing data to your tables then 
>>> >> > > > > >> >> > it's
>>> >> > > > possible that
>>> >> > > > > >> >> the
>>> >> > > > > >> >> > block cache is no longer being effective.  On the region
>>> >> > server
>>> >> > > > web UI
>>> >> > > > > >> >> check
>>> >> > > > > >> >> > the blockCacheHitRatio value.  You want this value to be 
>>> >> > > > > >> >> > high
>>> >> > (0
>>> >> > > > - 100).
>>> >> > > > > >> >>  If
>>> >> > > > > >> >> > this value is low it means that HBase has to go to disk 
>>> >> > > > > >> >> > to
>>> >> > fetch
>>> >> > > > blocks
>>> >> > > > > >> >> of
>>> >> > > > > >> >> > data.  You can control the amount of VM memory that HBase
>>> >> > > > allocates to
>>> >> > > > > >> >> the
>>> >> > > > > >> >> > block cache using the "hfile.block.cache.size" property
>>> >> > (default
>>> >> > > > is 0.2
>>> >> > > > > >> >> > (20%)).
>>> >> > > > > >> >> >
>>> >> > > > > >> >> > Cheers,
>>> >> > > > > >> >> > Dan
>>> >> > > > > >> >> >
>>> >> > > > > >> >> > On 16 February 2010 10:45, James Baldassari <
>>> >> > ja...@dataxu.com>
>>> >> > > > wrote:
>>> >> > > > > >> >> >
>>> >> > > > > >> >> > > Hi,
>>> >> > > > > >> >> > >
>>> >> > > > > >> >> > > Does anyone have any tips to share regarding 
>>> >> > > > > >> >> > > optimization
>>> >> > for
>>> >> > > > random
>>> >> > > > > >> >> > > read performance?  For writes I've found that setting a
>>> >> > large
>>> >> > > > write
>>> >> > > > > >> >> > > buffer and setting auto-flush to false on the client 
>>> >> > > > > >> >> > > side
>>> >> > > > significantly
>>> >> > > > > >> >> > > improved put performance.  Are there any similar easy
>>> >> > tweaks to
>>> >> > > > improve
>>> >> > > > > >> >> > > random read performance?
>>> >> > > > > >> >> > >
>>> >> > > > > >> >> > > I'm using HBase 0.20.3 in a very read-heavy real-time
>>> >> > system
>>> >> > > > with 1
>>> >> > > > > >> >> > > master and 3 region servers.  It was working ok for a
>>> >> > while,
>>> >> > > > but today
>>> >> > > > > >> >> > > there was a severe degradation in read performance.
>>> >> >  Restarting
>>> >> > > > Hadoop
>>> >> > > > > >> >> > > and HBase didn't help, are there are no errors in the 
>>> >> > > > > >> >> > > logs.
>>> >> > > >  Read
>>> >> > > > > >> >> > > performance starts off around 1,000-2,000 gets/second 
>>> >> > > > > >> >> > > but
>>> >> > > > quickly
>>> >> > > > > >> >> > > (within minutes) drops to around 100 gets/second.
>>> >> > > > > >> >> > >
>>> >> > > > > >> >> > > I've already looked at the performance tuning wiki 
>>> >> > > > > >> >> > > page.
>>> >> >  On
>>> >> > > > the server
>>> >> > > > > >> >> > > side I've increased hbase.regionserver.handler.count 
>>> >> > > > > >> >> > > from
>>> >> > 10 to
>>> >> > > > 100,
>>> >> > > > > >> >> but
>>> >> > > > > >> >> > > it didn't help.  Maybe this is expected because I'm 
>>> >> > > > > >> >> > > only
>>> >> > using
>>> >> > > > a single
>>> >> > > > > >> >> > > client to do reads.  I'm working on implementing a 
>>> >> > > > > >> >> > > client
>>> >> > pool
>>> >> > > > now, but
>>> >> > > > > >> >> > > I'm wondering if there are any other settings on the 
>>> >> > > > > >> >> > > server
>>> >> > or
>>> >> > > > client
>>> >> > > > > >> >> > > side that might improve things.
>>> >> > > > > >> >> > >
>>> >> > > > > >> >> > > Thanks,
>>> >> > > > > >> >> > > James
>>> >> > > > > >> >> > >
>>> >> > > > > >> >> > >
>>> >> > > > > >> >> > >
>>> >> > > > > >> >>
>>> >> > > > > >> >>
>>> >> > > > > >> >
>>> >> > > > > >
>>> >> > > > > >
>>> >> > > >
>>> >> > > >
>>> >> >
>>> >> >
>>> >
>>> >
>>
>>
>


I do not think it is a problem anymore with the advent of:
http://code.google.com/p/hadoop-gpl-compression/

Re: Optimizations for random read performance

Reply via email to