[
https://issues.apache.org/jira/browse/FLUME-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell updated FLUME-1275:
-----------------------------------
Attachment: FLUME-1275.patch.v2.txt
This commit addresses issues discussed up to now. I did not create a generic
interface for row key generators, though that is probably a good idea and maybe
cause for another JIRA. I'd like to keep the scope of this one JIRA limited to
just the serializer in question. Below is a doc describing how row-key
generation works in this patch:
/**
* Returns a row-key with the following format:
* [time in millis]-[random key]-[nonce]
*/
protected byte[] getRowKey(Calendar cal) {
/* NOTE: This key generation strategy has the following properties:
*
* 1) Within a single JVM, the same row key will never be duplicated.
* 2) Amongst any two JVM's operating at different time periods (according
* to their respective clocks), the same row key will never be
duplicated.
* 3) Amongst any two JVM's operating concurrently (according to their
* respective clocks), the odds of duplicating a row-key are non-zero
* but infinitesimal. This would require simultaneous collision in (a)
* the timestamp (b) the respective nonce and (c) the random string.
* The string is necessary since (a) and (b) could collide if a fleet
* of Flume agents are restarted in tandem.
*
* Row-key uniqueness is important because conflicting row-keys will cause
* data loss. */
> Add Regex Serializer for HBaseSink
> ----------------------------------
>
> Key: FLUME-1275
> URL: https://issues.apache.org/jira/browse/FLUME-1275
> Project: Flume
> Issue Type: Improvement
> Reporter: Patrick Wendell
> Assignee: Patrick Wendell
> Attachments: FLUME-1275.patch.v1.txt, FLUME-1275.patch.v2.txt
>
>
> It would be nice to have an "out of the box" HBase serializer that can
> extract column data from a regular expression. This is a feature in Hive and
> it is widely used:
> https://issues.apache.org/jira/browse/HIVE-167
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira