Re: Major Compaction Concerns

Mikael Sitruk Sat, 14 Jan 2012 13:31:00 -0800

Wow, thank you very much for all those precious explanations, pointers and
examples. It's a lot to ingest... I will try them (at least what i can with
0.90.4 (yes i'm upgrading from 0.90.1 to 0.90.4)) and keep you informed.
BTW I'm already using compression (GZ), the current data is randomized so I
don't have so much gain as you mentioned ( i think i'm around 30% only).
It seems that BF is one of the major thing i need to look up with the
compaction.ratio, and i need a different setting for my different CF. (one
CF has small set of column and each update will change 50% --> ROWCOL, the
second CF has always a new column per update --> ROW)
I'm not keeping more than one version neither, and you wrote this is not a
point query.


A suggestion is perhaps to take all those example/explanation and add them
to the book for future reference.

Regards,
Mikael.S


On Sat, Jan 14, 2012 at 4:06 AM, Nicolas Spiegelberg <[email protected]>wrote:

> >I'm sorry but i don't understand, of course i have a disk and network
> >saturation and the flush stop to flush because he is waiting for
> >compaction
> >to finish. Since this a major compaction was triggered - all the
> >stores (large number)  present on the disks (7 disk per RS) will be
> >grabbed
> >for major compact, and the I/O is affected. Network is also affected since
> >all are major compacting at the same time and replicating files on same
> >time (1GB network).
>
> When you have an IO problem, there are multiple pieces at play that you
> can adjust:
>
> Write: HLog, Flush, Compaction
> Read: Point Query, Scan
>
> If your writes are far more than your reads, then you should relax one of
> the write pieces.
> - HLog: You can't really adjust HLog IO outside of key compression
> (HBASE-4608)
> - Flush: You can adjust your compression.  None->LZO == 5x compression.
> LZO->GZ == 2x compression.  Both are at the expense of CPU.  HBASE-4241
> minimizes flush IO significantly in the update-heavy use case (discussed
> this in the last email).
> - Compaction: You can lower the compaction ratio to minimize the amount of
> rewrites over time.  That's why I suggested changing the ratio from 1.2 ->
> 0.25.  This gives a ~50% IO reduction (blog post on this forthcoming @
> http://www.facebook.com/UsingHBase ).
>
> However, you may have a lot more reads than you think.  For example, let's
> say read:write ratio is 1:10, so significantly read dominated.  Without
> any of the optimizations I listed in the previous email, your real read
> ratio is multiplied by the StoreFile count (because you naively read all
> StoreFiles).  So let say, during congestion, you have 20 StoreFiles.
> 1*20:10 means that you're now 2:1 read dominated.  You need features to
> reduce the number of StoreFiles you scan when the StoreFile count is high.
>
> - Point Query: bloom filters (HBASE-1200, HBASE-2794), lazy seek
> (HBASE-4465), and seek optimizations (HBASE-4433, HBASE-4434, HBASE-4469,
> HBASE-4532)
> - Scan: not as many optimizations here.  Mostly revolve around proper
> usage & seek-next optimization when using filters. Don't have JIRA numbers
> here, but probably half-dozen small tweaks were added to 0.92.
>
> >I don't have an increment workload (the workload either update columns on
> >a
> >CF or add column on a CF for the same key), so how those patch will help?
>
> Increment & read->update workload end up roughly picking up the same
> optimizations.  Adding a column to an existing row is no different than
> adding a new row as far as optimizations are concerned because there's
> nothing to de-dupe.
>
> >I don't say this is a bad thing, this is just an observation from our
> >test,
> >HBase will slow down the flush in case too many store file are present,
> >and
> >will add pressure on GC and memory affecting performance.
> >The update workload does not send all the row content for a certain key so
> >only partial data is written, in order to get all the row i presume that
> >reading the newest Store is not enough ("all" stores need to be read
> >collecting the more up to date field a rebuild a full row), or i'm missing
> >something?
>
> Reading all row columns is the same as doing a scan.  You're not doing a
> point query if you don't specify the exact key (columns) you're looking
> for.  Setting versions to unlimited, then getting all versions of a
> particular ROW+COL would also be considered a scan vs a point query as far
> as optimizations are concerned.
>
> >1. If i did not set a specific property for bloom filter (BF), does it
> >means that i'm not using them (the book only refer to BF with regards to
> >CF)?
>
> By default, bloom filters are disabled, so you need to enable them to get
> the optimizations.  This is by design.  Bloom Filters trade off cache
> space for low-overhead probabilistic queries.  Default is 8-bytes per
> bloom entry (key) & 1% false positive rate.  You can use 'bin/hbase
> org.apache.hadoop.hbase.io.hfile.HFile' (look at help, then -f to specify
> a StoreFile and then use -m for meta info) to see your StoreFile's average
> KV size.  If size(KV) == 100 bytes, then blooms use 8% of the space in
> cache, which is better than loading the StoreFile block only to get a miss.
>
> Whether to use a ROW or ROWCOL bloom filter depends on your write & read
> pattern.  If you read the entire row at a time, use a ROW bloom.  If you
> point query, ROW or ROWCOL are both options.  If you write all columns for
> a row at the same time, definitely use a ROW bloom.  If you have a small
> column range and you update them at different rates/times, then a ROWCOL
> bloom filter may be more helpful.  ROWCOL is really useful if a scan query
> for a ROW will normally return results, but a point query for a ROWCOL may
> have a high miss rate.  A perfect example is storing unique hash-values
> for a user on disk.  You'd use 'user' as the row & the hash as the column.
>  Most instances, the hash won't be a duplicate, so a ROWCOL bloom would be
> better.
>
> >3. How can we ensure that compaction will not suck too much I/O if we
> >cannot control major compaction?
>
> TCP Congestion Control will ensure that a single TCP socket won't consume
> too much bandwidth, so that part of compactions is automatically handled.
> The part that you need to handle is the number of simultaneous TCP sockets
> (currently 1 until multi-threaded compactions) & the aggregate data volume
> transferred over time.  As I said, this is controlled by compaction.ratio.
>  If temporary high StoreFile counts cause you to bottleneck, slight
> latency variance is an annoyance of the current compaction algorithm but
> the underlying problem you should be looking at solving is the system's
> inability to filter out the unnecessary StoreFiles.
>
>

Re: Major Compaction Concerns

Reply via email to