CHUKWA-789. Added HBase schema to data model document. (Eric Yang)
Project: http://git-wip-us.apache.org/repos/asf/chukwa/repo Commit: http://git-wip-us.apache.org/repos/asf/chukwa/commit/6b70f9e5 Tree: http://git-wip-us.apache.org/repos/asf/chukwa/tree/6b70f9e5 Diff: http://git-wip-us.apache.org/repos/asf/chukwa/diff/6b70f9e5 Branch: refs/heads/master Commit: 6b70f9e544074bedddb573f3ab4cf45ce0f0eea8 Parents: aa76d99 Author: Eric Yang <[email protected]> Authored: Sat Nov 28 10:42:59 2015 -0800 Committer: Eric Yang <[email protected]> Committed: Sat Nov 28 10:42:59 2015 -0800 ---------------------------------------------------------------------- CHANGES.txt | 2 + src/site/apt/datamodel.apt | 90 ++++++++++++++++++++++++++++++++++++++++- 2 files changed, 91 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/chukwa/blob/6b70f9e5/CHANGES.txt ---------------------------------------------------------------------- diff --git a/CHANGES.txt b/CHANGES.txt index 6d37755..01e8f6d 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -30,6 +30,8 @@ Trunk (unreleased changes) IMPROVEMENTS + CHUKWA-789. Added HBase schema to data model document. (Eric Yang) + CHUKWA-786. Update documentation to reflect 0.7 release. (Eric Yang) CHUKWA-773. Update maven surefire version. (Anna Wang via Eric Yang) http://git-wip-us.apache.org/repos/asf/chukwa/blob/6b70f9e5/src/site/apt/datamodel.apt ---------------------------------------------------------------------- diff --git a/src/site/apt/datamodel.apt b/src/site/apt/datamodel.apt index 1aae1f5..7f44c78 100644 --- a/src/site/apt/datamodel.apt +++ b/src/site/apt/datamodel.apt @@ -51,4 +51,92 @@ little peculiar, but it's actually the same way that TCP sequence numbers work. correctly after a crash, and not send redundant data. When starting adaptors, it's usually save to specify 0 as an ID, but it's sometimes useful to specify something else. For instance, it lets you do things like only tail the second -half of a file. +half of a file. + +HBase Schema + +* Metrics + + Chukwa table stores time series data. + +** Row Key + +*------*------*------------*------------* +| | Day | Metric MD5 | Source MD5 | +*------*------*------------*------------* +| Size | 2 | 6 | 6 | +*------*------*------------*------------* + + Row key is composed of 14 bytes data. First 2 bytes are day of the year. +The next 6 bytes are md5 signature of metrics name. The last 6 bytes are +md5 signature of data source. This arrangement helps Chukwa to partition +data evenly across regions base on time. + + This arrangement provides a good condensed store for data of the same day +for the same source. + +** Column Family + + The column family format for Chukwa table are: + +*---------------*-----------------------------------------------------------------: +| Column Family | Description | +*---------------*-----------------------------------------------------------------: +| t | Time series data. Column name is timestamp. Value is a string | +*---------------*-----------------------------------------------------------------: +| a | Annotation, string tags associated with time series data. | +*---------------*-----------------------------------------------------------------: + +* Metadata + + Metadata is designed to store point lookup data. For example, small amount of +data to describe the metric name mapping for chukwa table. It is also used to store +JSON blob of dashboard data. + +** Row Key + +*----------------*------------------------------------------------------------------: +| Row Key | Description | +*----------------*------------------------------------------------------------------: +| [Metrics Group]| Metrics Group Name, this allows to fetch all metrics name from | +| | the group can be fetched from loading the row key. | +*----------------*------------------------------------------------------------------: +| chart_meta | All charts are stored in this row. | +*----------------*------------------------------------------------------------------: +| dashboard_meta | All dashboard are stored in this row. | +*----------------*------------------------------------------------------------------: +| widget_meta | All widgets are stored in this row. | +*----------------*------------------------------------------------------------------: + +** Special Row + +*----------------*------------------------------------------------------------------: +| chart_meta | Cell contains the rendering option and metric series name in | +| | a JSON blob | +*----------------*------------------------------------------------------------------: +| dashboard_meta | Cell describes one dashboard view | +*----------------*------------------------------------------------------------------: +| widget_meta | Cell describes title and URL of a dashboard widget | +*----------------*------------------------------------------------------------------: + +** Column Family + +*---------------*-------------------------------------------------------------------: +| Column Family | Description | +*---------------*-------------------------------------------------------------------: +| k | Key, associated with a fixed structure for describing key types | +| | and md5 signature of the key used in chukwa table. | +*---------------*-------------------------------------------------------------------: +| c | column for storing JSON blob for special rows. This column is | +| | used to store dashboard, chart, and widget metadata. | +*---------------*-------------------------------------------------------------------: + + Key Types for k column Family, the current supported key types are: + +*----------*----------------------------------------------------: +| Type | Description | +*----------*----------------------------------------------------: +| metric | This key is a metric name. | +*----------*----------------------------------------------------: +| source | This key is a source name. | +*----------*----------------------------------------------------:
