[
https://issues.apache.org/jira/browse/HBASE-7716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567193#comment-13567193
]
Matt Corgan commented on HBASE-7716:
------------------------------------
I'm worried that HBase was designed with this feature and already has it,
namely the Row key itself. Perhaps we should rename Cell.getRowArray() to
Cell.getEntityGroupArray()?
I do agree with you that it feels like we're missing a field, but I would argue
that we're missing the ability to use the Family field to it's potential. To
take a "small data" example, let's say you have these two tables in a
relational database:
{code}
User{
Long id,
String email
}
UserComment{
Long userId,
String commentId,
String comment}
{code}
You want to create a User entity group in hbase, so your User.id has to be your
Row key. This is to keep each user on an individual machine and to enable
efficient use of Row key bloom filters (an important feature that I wonder how
Row Groups would support). Because HBase doesn't currently support a lot of
column families, you are forced to combine multiple tables into a single column
family, like column family "cf0" below:
{code}
User{
row:id,
family:cf0{
qualifier:User.email,
qualifier:UserComment.commentId,
qualifier:UserComment.comment
}
}
{code}
The unfortunate thing about the above is that we are forced to prepend the
relational table name in front of each qualifier name. I think what we are
missing is the idea of locality groups or column family aliases. I would much
prefer to set the relational table name as my column family value, and then
have an external configuration that maps family=User and family=UserComment to
physical column family "cf0". Then my hbase entry would look more like:
{code}
User{
row:Long id,
cf:User{
qualifier:email
}
cf:UserComment{
qualifier:commentId,
qualifier:comment
}
}
{code}
And there would be a separate locality configuration per table with entries
like:
{code}
localities{
'User'=>cf0,
'UserComment'=>cf0,
default=>cfDefault
}
{code}
That is just a general example though. I'm trying to understand if the current
Row key were treated as the entity key, then what do you want/need from the
rest of the Cell to help your use cases? Maybe an additional field between
family and qualifier called "group"?
> Row Groups / Row Family / Entity Groups in HBase
> ------------------------------------------------
>
> Key: HBASE-7716
> URL: https://issues.apache.org/jira/browse/HBASE-7716
> Project: HBase
> Issue Type: New Feature
> Components: Client, regionserver
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 0.98.0
>
> Attachments: Entity Groups in HBase.txt
>
>
> This issue is to discuss the possible addition to the HBase data model for
> "Row Groups".
> As we are nearing 1.0, discussing this for 0.98 seems the right time,
> especially given that we have custom region split policies, local
> transactions, and API overhaul around data types -> bytes.
> Row Groups are semantic groupings of rows in the Hbase data model. All rows
> within a given row group share the same row group key.
> Row groups are similar to column families in HBase or locality groups in
> BigTable, but transposed to rows instead of columns. All the rows within a
> row group physically belong together, and served by a single region. This
> means that region boundaries cannot split the row group.
> Row groups are not predefined, and are dynamic. There can be one row group
> per row.
> Row keys are fully optional, and backwards compatible.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira