[ https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977632#comment-14977632 ]
Enis Soztutar commented on HBASE-14468: --------------------------------------- This is a good idea. We should add this to the list of compaction policies with good documentation. We have use cases where there is a TTL of a couple of days. Metrics store is one such example for the raw data in a high ingest scenario. For the patch itself, the first if is not needed if we are checking for the DisabledRSP anyway: {code} + if(splitPolicyClassName.equals(IncreasingToUpperBoundRegionSplitPolicy.class.getName())){ + throw new RuntimeException("Default split policy for FIFO compaction"+ + " is not supported, aborting."); + } else if( !splitPolicyClassName.equals(DisabledRegionSplitPolicy.class.getName())){ + warn.append(":region splits must be disabled:"); + } {code} Can we make it so that if a split happens we still compact the reference files, but we do not compact otherwise? We can also allow very-slow splits in the case where the reference files will be cleaned out due to TTL. In this case, a region can still split every TTL interval. RuntimeException's thrown will cause region opening to fail or RS to abort? Can we hook the verify code to {{HMaster.sanityCheckTableDescriptor()}}, so that you cannot alter the table or create a table with those settings. This will make a much better experience for the user. Can we also simplify the configuration for these. Maybe we auto-disable the major compactions, and set the blocking store files if they are not set? Can we use HStore.removeUnneededFiles() or {{storeEngine.getStoreFileManager()}} which already implements the is expired logic so that there is no duplication there? > Compaction improvements: FIFO compaction policy > ----------------------------------------------- > > Key: HBASE-14468 > URL: https://issues.apache.org/jira/browse/HBASE-14468 > Project: HBase > Issue Type: Improvement > Reporter: Vladimir Rodionov > Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, > HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, > HBASE-14468-v6.patch > > > h2. FIFO Compaction > h3. Introduction > FIFO compaction policy selects only files which have all cells expired. The > column family MUST have non-default TTL. > Essentially, FIFO compactor does only one job: collects expired store files. > I see many applications for this policy: > # use it for very high volume raw data which has low TTL and which is the > source of another data (after additional processing). Example: Raw > time-series vs. time-based rollup aggregates and compacted time-series. We > collect raw time-series and store them into CF with FIFO compaction policy, > periodically we run task which creates rollup aggregates and compacts > time-series, the original raw data can be discarded after that. > # use it for data which can be kept entirely in a a block cache (RAM/SSD). > Say we have local SSD (1TB) which we can use as a block cache. No need for > compaction of a raw data at all. > Because we do not do any real compaction, we do not use CPU and IO (disk and > network), we do not evict hot data from a block cache. The result: improved > throughput and latency both write and read. > See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style > h3. To enable FIFO compaction policy > For table: > {code} > HTableDescriptor desc = new HTableDescriptor(tableName); > > desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, > FIFOCompactionPolicy.class.getName()); > {code} > For CF: > {code} > HColumnDescriptor desc = new HColumnDescriptor(family); > > desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, > FIFOCompactionPolicy.class.getName()); > {code} > Make sure, that table has disabled region splits (either by setting > explicitly DisabledRegionSplitPolicy or by setting > ConstantSizeRegionSplitPolicy and very large max region size). You will have > to increase to a very large number store's blocking file number : > *hbase.hstore.blockingStoreFiles* as well. > > h3. Limitations > Do not use FIFO compaction if : > * Table/CF has MIN_VERSION > 0 > * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL) -- This message was sent by Atlassian JIRA (v6.3.4#6332)