[
https://issues.apache.org/jira/browse/HBASE-21195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613024#comment-16613024
]
jagan commented on HBASE-21195:
-------------------------------
{noformat}
LogsDB is a layer on top of RocksDB, an ordered durable key-value data store
based on LSM trees. LogsDB is a time-ordered collection of RocksDB column
families, which are full-fledged RocksDB instances sharing a common write-ahead
log. Each RocksDB instance is called a LogsDB partition. All new writes for all
logs, be it one log or a million, go into the most recent partition, which
orders them by (log id, LSN), and saves on disk in a sequence of large sorted
immutable files, called SST files. This makes the write IO workload on the
drive mostly sequential, but creates the need to merge data from multiple files
(up to the maximum allowed number of files in a LogsDB partition, typically
about 10) when reading records. Reading from multiple files may lead to read
amplification, or wasting some read IO.
LogsDB controls read amplification in a way uniquely suited for the log data
model with its immutable records identified by immutable LSNs monotonically
increasing with time. Instead of controlling the number of sorted files by
compacting (merge-sorting) them into a bigger sorted run LogsDB simply leaves
the partition alone once it reaches its maximum number of SST files, and
creates a new most recent partition. Because partitions are read sequentially,
at no time the number of files to read concurrently will exceed the maximum
number of files in a single partition, even if the total number of SST files in
all partitions reaches tens of thousands. Space reclamation is performed
efficiently by deleting (or in some cases infrequently compacting) the oldest
partition.
{noformat}
Source
https://code.fb.com/core-data/logdevice-a-distributed-data-store-for-logs/
Was wondering why not have this (no compaction and range instead of index and
bloomfilter for hfile) as an option for log storage kind of usecase in HBase.
> Support Log storage similar to FB LogDevice
> -------------------------------------------
>
> Key: HBASE-21195
> URL: https://issues.apache.org/jira/browse/HBASE-21195
> Project: HBase
> Issue Type: New Feature
> Reporter: jagan
> Priority: Major
>
> Log storage, which is write once and sequential data, can be optimized in the
> following ways,
> 1. Key generated should be incremental.
> 2. HFile key index can be range and need not use BloomFilter
> 3. Instead of compaction, periodic delete of old files based on TTL can be
> supported
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)