[
https://issues.apache.org/jira/browse/GORA-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kevin Ratnasekera updated GORA-413:
-----------------------------------
Fix Version/s: (was: 0.9)
1.0
> Support creation of dynamic columns within Gora datastore mapping designs
> -------------------------------------------------------------------------
>
> Key: GORA-413
> URL: https://issues.apache.org/jira/browse/GORA-413
> Project: Apache Gora
> Issue Type: New Feature
> Components: gora-hbase
> Affects Versions: 0.6
> Reporter: Lewis John McGibbney
> Priority: Major
> Fix For: 1.0
>
>
> The conversation taking place on [dynamically generating HBase
> columns|http://www.mail-archive.com/dev%40gora.apache.org/msg05754.html] has
> raised an issue that new functionality needs to be added in order to achieve
> this.
> The main driver for this issue coming to light is that Chukwa logs need to
> dynamically create many many columns over time directly dependent on the
> number of data chunks we get. Each data chunk has a [Sequence ID], this
> sequenceID should be the column name.
> The table design will look like this
> {code}
> Row Key: [Invert Date]:[Data Type]:[Primary Key]
> Column Family: log
> Column Name: [Sequence ID]
> Timestamp: [log entry timestamp]
> Example:
> Row Key: 2132013102:TT:host1.example.com
> Column Family: log
> Column Name: 1230
> Cell Value: 2013-01-23 12:01:30 INFO This is a log entry.
> Timestamp: 1358942490
> {code}
> The inverted date allow the table to be partitioned by hour or day of the
> month or month more easily.
> The usage of column name for consecutive sequence to allow fast retrieval in
> a linear scan. This format is typically good for retrieve a hour worth of
> logs fast for a node. Hence, if we are doing batch scanning of the table in a
> rolling window via map reduce job at every hour interval, we get a even
> spread the work load to multiple map reduce tasks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)