>1. my CF were already working with BF they used ROWCOL, (i didn't pay >attention to that at the time i wrote my answers) >2. I see form the logs that the BF is already 100% - is it bad? should I >had more memory for BF?
Since Bloom Filters are a probabilistic optimization, it's kinda hard to analyze your efficiency. Mostly, we rely on theory and a little bit of experimentation. Basically, you want your key queries to have a high miss rate on HFiles. This doesn't mean that the key doesn't exist in the Store. It just means that you're not constantly writing to it, so it doesn't exist in all N StoreFiles. Optimally, you want 1 of the blooms to hit (key exists in file) and N-1 to miss. Metrics that you can look at (not sure about the versions of when these were introduced): keymaybeinbloomcnt : number of bloom hits keynotinbloomcnt : number of bloom misses. staticbloomsizekb : size that bloom data takes up in memory (HFileV1) Note that per-CF metrics are added in 0.94 so you can watch bloom efficiency in finer granularity. >3. HLog compression (HBASE-4608) is not scheduled yet, is it by intention? There's limited bandwidth and this is an open source project, so... :) >4. Compaction.ratio is only for 0.92.x releases, so i cannot use it yet. "hbase.hstore.compaction.ratio" is in 0.90 (https://svn.apache.org/repos/asf/hbase/branches/0.90/src/main/java/org/apa che/hadoop/hbase/regionserver/Store.java) >6. I have also noticed that in a workload of pure insert (no read, empty >regions, new keys) the store files on the RS can reach more than 4500 >files, nevertheless with a update/read scenario the store files were not >passing 1500 files per region (the throttling of the flush was active and >not in insert) Is there an explanation for that? That depends on the size of your major compacted data. Updates will dedupe and lower your compaction volume.
