[ https://issues.apache.org/jira/browse/PHOENIX-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004532#comment-15004532 ]
James Taylor commented on PHOENIX-1734: --------------------------------------- Excellent work, [~rajeshbabu]. This is a huge step forward for making local indexes copacetic with HBase internals. For easier code review, would it be possible to spin up a pull request so we can see the changes in context (since it’s rather large)? I'm not a huge fan of RB as it's counter intuitive to use IMHO (but I suppose you could do both if the HBase folks like RB). Would be great if you could review too, [~jesse_yates]. Here’s some initial feedback: Would you mind adding some code comments here? What’s the reason to not write to the WAL here? Also, what’s the reason for the check on the table name? And what’s the reason we need the allowLocalUpdates? Is that because local index writes will always be local? {code} - if (tableReference.getTableName().startsWith(MetaDataUtil.LOCAL_INDEX_TABLE_PREFIX)) { - Region indexRegion = IndexUtil.getIndexRegion(env); - if (indexRegion != null) { - throwFailureIfDone(); - indexRegion.batchMutate(mutations.toArray(new Mutation[mutations.size()]), + if (env != null + && tableReference.getTableName().equals( + env.getRegion().getTableDesc().getNameAsString())) { + throwFailureIfDone(); + + if(allowLocalUpdates) { + for (Mutation m : mutations) { + m.setDurability(Durability.SKIP_WAL); + } + env.getRegion().batchMutate( + mutations.toArray(new Mutation[mutations.size()]), HConstants.NO_NONCE, HConstants.NO_NONCE); - return null; } + return null; {code} Given the above usage of batchMutate(), I’m assuming that means that local index updates are not transactional with the data updates. I suppose we can do this in a follow up JIRA, but wouldn’t it be possible to use the MultiRowMutationService.mutateRows() call that includes both index and data updates to get transactionality? We could have our Indexer.preBatchMutate() call e.bypass() which is a technique I’ve seen used by Tephra to skip the normal processing. Having the index updates be transactional is one of the most important advantages of local indexing IMHO, so it'd be a shame *not* to have this. Instead of having a bunch of if statements to prefix local index column families, let’s just create them like this in the beginning. The other reason to do this is that we don’t want to be creating new byte arrays per row if don’t have to. We already have a simple algorithm to generate column names when indexes are used. This would be similar for column family names of local indexes. See IndexStatementRewriter.visit(ColumnParseNode node) which is used to translate a query against the data table to a query against an index table (and that method translates column references and adds a CAST to ensure the type stays the same), MetaDataClient.createIndex() (which you hopefully won’t need to change), and in particular IndexUtil.getIndexColumnName(String cf, String cn) which takes the data column family and column name and returns the index column name. The code in IndexMaintainer is not getting the correct array from the cell. It’s already been fixed in the 4.x and master branches, so please be careful when re-basing. {code} - ImmutableBytesPtr family = new ImmutableBytesPtr(kv.getRowArray(),kv.getFamilyOffset(),kv.getFamilyLength()); - ImmutableBytesPtr qual = new ImmutableBytesPtr(kv.getRowArray(), kv.getQualifierOffset(), kv.getQualifierLength()); ImmutableBytesPtr value = new ImmutableBytesPtr(kv.getValueArray(), kv.getValueOffset(), kv.getValueLength()); - valueMap.put(new ColumnReference(kv.getRowArray(), kv.getFamilyOffset(), kv.getFamilyLength(), kv.getRowArray(), kv.getQualifierOffset(), kv.getQualifierLength()), value); + valueMap.put( + new ColumnReference(kv.getRowArray(), kv.getFamilyOffset(), kv.getFamilyLength(), + kv.getRowArray(), kv.getQualifierOffset(), kv.getQualifierLength()), value); {code} Some minor nits: Remove comment in ParallelWriterIndexCommitter as it’s no longer applicable: {code} // TODO: Once HBASE-11766 is fixed, reexamine whether this is necessary. // Also, checking the prefix of the table name to determine if this is a local // index is pretty hacky. If we're going to keep this, we should revisit that // as well. {code} Remove commented out code {code} - if (localIndexTable) { + /**if (localIndexTable) { ensureLocalIndexTableCreated(tableName, tableProps, families, splits, MetaDataUtil.getClientTimeStamp(m)); - } else if (!MetaDataUtil.isMultiTenant(m, kvBuilder, ptr)) { + } else*/ if (!localIndexTable && !MetaDataUtil.isMultiTenant(m, kvBuilder, ptr)) { ensureViewIndexTableCreated(tenantIdBytes.length == 0 ? null : PNameFactory.newName(tenantIdBytes), physicalTableName, MetaDataUtil.getClientTimeStamp(m)); {code} > Local index improvements > ------------------------ > > Key: PHOENIX-1734 > URL: https://issues.apache.org/jira/browse/PHOENIX-1734 > Project: Phoenix > Issue Type: Improvement > Reporter: Rajeshbabu Chintaguntla > Assignee: Rajeshbabu Chintaguntla > Attachments: PHOENI-1734-WIP.patch, PHOENIX-1734_v1.patch > > > Local index design considerations: > 1. Colocation: We need to co-locate regions of local index regions and data > regions. The co-location can be a hard guarantee or a soft (best approach) > guarantee. The co-location is a performance requirement, and also maybe > needed for consistency(2). Hard co-location means that either both the data > region and index region are opened atomically, or neither of them open for > serving. > 2. Index consistency : Ideally we want the index region and data region to > have atomic updates. This means that they should either (a)use transactions, > or they should (b)share the same WALEdit and also MVCC for visibility. (b) is > only applicable if there is hard colocation guarantee. > 3. Local index clients : How the local index will be accessed from clients. > In case of the local index being managed in a table, the HBase client can be > used for doing scans, etc. If the local index is hidden inside the data > regions, there has to be a different mechanism to access the data through the > data region. > With the above considerations, we imagine three possible implementation for > the local index solution, each detailed below. > APPROACH 1: Current approach > (1) Current approach uses balancer as a soft guarantee. Because of this, in > some rare cases, colocation might not happen. > (2) The index and data regions do not share the same WALEdits. Meaning > consistency cannot be achieved. Also there are two WAL writes per write from > client. > (3) Regular Hbase client can be used to access index data since index is just > another table. > APPROACH 2: Shadow regions + shared WAL & MVCC > (1) Introduce a shadow regions concept in HBase. Shadow regions are not > assigned by AM. Phoenix implements atomic open (and split/merge) of region > opening for data regions and index regions so that hard co-location is > guaranteed. > (2) For consistency requirements, the index regions and data regions will > share the same WALEdit (and thus recovery) and they will also share the same > MVCC mechanics so that index update and data update is visible atomically. > (3) Regular Hbase client can be used to access index data since index is just > another table. > APPROACH 3: Storing index data in separate column families in the table. > (1) Regions will have store files for cfs, which is sorted using the primary > sort order. Regions may also maintain stores, sorted in secondary sort > orders. This approach is similar in vein how a RDBMS keeps data (a B-TREE in > primary sort order and multiple B-TREEs in secondary sort orders with > pointers to primary key). That means store the index data in separate column > families in the data region. This way a region is extended to be more similar > to a RDBMS (but LSM instead of BTree). This is sometimes called shadow cf’s > as well. This approach guarantees hard co-location. > (2) Since everything is in a single region, they automatically share the > same WALEdit and MVCC numbers. Atomicity is easily achieved. > (3) Current Phoenix implementation need to change in such a way that column > families selection in read/write path is based data table/index table(logical > table in phoenix). > I think that APPROACH 3 is the best one for long term, since it does not > require to change anything in HBase, mainly we don't need to muck around with > the split/merge stuff in HBase. It will be win-win. > However, APPROACH 2 still needs a “shadow regions” concept to be implemented > in HBase itself, and also a way to share WALEdits and MVCCs from multiple > regions. > APPROACH 1 is a good start for local indexes, but I think we are not getting > the full benefits for the feature. We can support this for the short term, > and decide on the next steps for a longer term implementation. > we won't be able to get to implementing it immediately, and want to start a > brainstorm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)