[
https://issues.apache.org/jira/browse/HBASE-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030807#comment-13030807
]
Karthick Sankarachary commented on HBASE-3851:
----------------------------------------------
Basically, the goal here is to reduce the number of round trips between the
client and region servers. By way of example, let's say we've a table of user
profiles, where the profile includes the users interests (a set of things they
like) and portfolio (a map of stock symbol to price paid). If we put their
interests (or portfolio) in a single column, then every time we want to
add/remove an interest (stock), we'll most likely need to read that column
prior to updating it. On the other hand, if we break down the interests
(portfolio) into multiple columns, one for each element in the set (map), then
that will allow us to add/remove elements without reading the entire collection
first.
Having said that, I took a look at the object-mappings proposed for some of the
other NoSQL databases, and they all happen to live outside of the project
proper. In light of that, I'll do as you suggested, and put this on github. If
you'd like to revisit this down the road, please feel free to re-open this.
> A Random-Access Column Object Model
> -----------------------------------
>
> Key: HBASE-3851
> URL: https://issues.apache.org/jira/browse/HBASE-3851
> Project: HBase
> Issue Type: New Feature
> Components: client
> Affects Versions: 0.92.0
> Reporter: Karthick Sankarachary
> Assignee: Karthick Sankarachary
> Priority: Minor
> Labels: HBase, Mapping, Object
> Fix For: 0.92.0
>
> Attachments: HBASE-3851.patch
>
>
> By design, a value in HBase is an opaque and atomic byte array. In theory,
> any arbitrary type can potentially be represented in terms of such
> unstructured yet indivisible units. However, as the complexity of the type
> increases, so does the need to access it in parts rather than in whole. That
> way, one can update parts of a value without reading the whole first. This
> calls for transparency in the type of data being accessed.
> To that end, we introduce here a simple object model where each part maps to
> a {{HTable}} column and value thereof. Specifically, we define a
> {{ColumnObject}} interface that denotes an arbitrary type comprising
> properties, where each property is a {{<name, value>}} tuple of byte arrays.
> In essence, each property maps to a distinct HBase {{KeyValue}}. In
> particular, the property's name maps to a column, prefixed by the qualifier
> and the object's identifier (assumed to be unique within a column family),
> and the property's value maps to the {{KeyValue#getValue()}} of the
> corresponding column. Furthermore, the {{ColumnObject}} is marked as a
> {{RandomAccess}} type to underscore the fact that its properties can be
> accessed in and of themselves.
> For starters, we provide three concrete objects - a {{ColumnMap}},
> {{ColumnList}} and {{ColumnSet}} that implement the {{Map}}, {{List}} and
> {{Set}} interfaces respectively. The {{ColumnMap}} treats each {{Map.Entry}}
> as an object property, the {{ColumnList}} stores each element against its
> ordinal position, and the {{ColumnSet}} considers each element as the
> property name (as well as its value). For the sake of convenience, we also
> define extensions to the {{Get}}, {{Put}}, {{Delete}} and {{Result}} classes
> that are aware of and know how to deal with such {{ColumnObject}} types.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira