[
https://issues.apache.org/jira/browse/PHOENIX-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980699#comment-14980699
]
James Taylor commented on PHOENIX-1734:
---------------------------------------
Love all that code you've removed (as will [~stack] I think :-) ). Is this the
top level idea?
- Each data table column family will have a corresponding local index column
family formed by prefixing the data table column family with a known prefix.
The local index column families will essentially be hidden from Phoenix.
- When a row is written to the data table, you write a corresponding row into
the hidden local index column family, prefixed with the region start key (i.e.
the rows have different row keys).
- Use a custom split policy to ensure that the local index column family does
not get split. Instead, you drive the split from the split of any data column
family.
-- The "magic" is in the write of the local index rows is here:
{code}
+ private long applyFamilyMapToMemstore(Map<byte[], List<Cell>> familyMap,
HRegion region)
+ throws IOException {
+ long size = 0;
+ MultiVersionConsistencyControl.WriteEntry wc = null;
+ try {
+ long mvccNum =
+ MultiVersionConsistencyControl
+ .getPreAssignedWriteNumber(region.getSequenceId());
+ wc = region.getMVCC().beginMemstoreInsertWithSeqNum(mvccNum);
+
+ for (Map.Entry<byte[], List<Cell>> e : familyMap.entrySet()) {
+ byte[] family = e.getKey();
+ List<Cell> cells = e.getValue();
+ Store store = region.getStore(family);
+ int listSize = cells.size();
+ for (int i = 0; i < listSize; i++) {
+ Cell cell = cells.get(i);
+ CellUtil.setSequenceId(cell, mvccNum);
+ Pair<Long, Cell> ret = store.add(cell);
+ size += ret.getFirst();
+ }
+ }
+ } finally {
+ region.getMVCC().completeMemstoreInsert(wc);
+ }
+ return size;
+ }
{code}
Does this make local index writes transactionally consistent with data table
writes? Ideally, we'd want Region APIs to cover this too. Ideas, [~apurtell]?
This looks great, [~rajeshbabu]. Thanks so much for following through on this.
FYI, [~tdsilva] - some incentive to get your txn merge into master soon. :-)
> Local index improvements
> ------------------------
>
> Key: PHOENIX-1734
> URL: https://issues.apache.org/jira/browse/PHOENIX-1734
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Attachments: PHOENI-1734-WIP.patch
>
>
> Local index design considerations:
> 1. Colocation: We need to co-locate regions of local index regions and data
> regions. The co-location can be a hard guarantee or a soft (best approach)
> guarantee. The co-location is a performance requirement, and also maybe
> needed for consistency(2). Hard co-location means that either both the data
> region and index region are opened atomically, or neither of them open for
> serving.
> 2. Index consistency : Ideally we want the index region and data region to
> have atomic updates. This means that they should either (a)use transactions,
> or they should (b)share the same WALEdit and also MVCC for visibility. (b) is
> only applicable if there is hard colocation guarantee.
> 3. Local index clients : How the local index will be accessed from clients.
> In case of the local index being managed in a table, the HBase client can be
> used for doing scans, etc. If the local index is hidden inside the data
> regions, there has to be a different mechanism to access the data through the
> data region.
> With the above considerations, we imagine three possible implementation for
> the local index solution, each detailed below.
> APPROACH 1: Current approach
> (1) Current approach uses balancer as a soft guarantee. Because of this, in
> some rare cases, colocation might not happen.
> (2) The index and data regions do not share the same WALEdits. Meaning
> consistency cannot be achieved. Also there are two WAL writes per write from
> client.
> (3) Regular Hbase client can be used to access index data since index is just
> another table.
> APPROACH 2: Shadow regions + shared WAL & MVCC
> (1) Introduce a shadow regions concept in HBase. Shadow regions are not
> assigned by AM. Phoenix implements atomic open (and split/merge) of region
> opening for data regions and index regions so that hard co-location is
> guaranteed.
> (2) For consistency requirements, the index regions and data regions will
> share the same WALEdit (and thus recovery) and they will also share the same
> MVCC mechanics so that index update and data update is visible atomically.
> (3) Regular Hbase client can be used to access index data since index is just
> another table.
> APPROACH 3: Storing index data in separate column families in the table.
> (1) Regions will have store files for cfs, which is sorted using the primary
> sort order. Regions may also maintain stores, sorted in secondary sort
> orders. This approach is similar in vein how a RDBMS keeps data (a B-TREE in
> primary sort order and multiple B-TREEs in secondary sort orders with
> pointers to primary key). That means store the index data in separate column
> families in the data region. This way a region is extended to be more similar
> to a RDBMS (but LSM instead of BTree). This is sometimes called shadow cf’s
> as well. This approach guarantees hard co-location.
> (2) Since everything is in a single region, they automatically share the
> same WALEdit and MVCC numbers. Atomicity is easily achieved.
> (3) Current Phoenix implementation need to change in such a way that column
> families selection in read/write path is based data table/index table(logical
> table in phoenix).
> I think that APPROACH 3 is the best one for long term, since it does not
> require to change anything in HBase, mainly we don't need to muck around with
> the split/merge stuff in HBase. It will be win-win.
> However, APPROACH 2 still needs a “shadow regions” concept to be implemented
> in HBase itself, and also a way to share WALEdits and MVCCs from multiple
> regions.
> APPROACH 1 is a good start for local indexes, but I think we are not getting
> the full benefits for the feature. We can support this for the short term,
> and decide on the next steps for a longer term implementation.
> we won't be able to get to implementing it immediately, and want to start a
> brainstorm.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)