Hi,
I've been using the IndexedTable stuff from contrib and come across
a bit
of an issue.
When I delete a column my indexes are removed for that column. I've
run
through the code in IndexedRegion and used very similar code in my
own
classes to recreate the index after I've run the delete.
I've also noticed that if I run a Put after the Delete then the
index will
be re-created.
Neither the Delete or the subsequent Put in the second example uses
any of
the columns that are part of the index (either indexed or additional
columns).
If I'm not mistaken the problem lies in the code to rebuild the
index from
org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegion:
@Override
public void delete(Delete delete, final Integer lockid, boolean
writeToWAL)
throws IOException {
if (!getIndexes().isEmpty()) {
// Need all columns
NavigableSet<byte[]> neededColumns =
getColumnsForIndexes(getIndexes());
Get get = new Get(delete.getRow());
for (byte [] col : neededColumns) {
get.addColumn(col);
}
Result oldRow = super.get(get, null);
SortedMap<byte[], byte[]> oldColumnValues =
convertToValueMap(oldRow);
for (IndexSpecification indexSpec : getIndexes()) {
removeOldIndexEntry(indexSpec, delete.getRow(),
oldColumnValues);
}
// Handle if there is still a version visible.
if (delete.getTimeStamp() != HConstants.LATEST_TIMESTAMP) {
get.setTimeRange(1, delete.getTimeStamp());
oldRow = super.get(get, null);
SortedMap<byte[], byte[]> currentColumnValues =
convertToValueMap(oldRow);
LOG.debug("There are " + currentColumnValues + " entries to
re-index");
for (IndexSpecification indexSpec : getIndexes()) {
if (IndexMaintenanceUtils.doesApplyToIndex(indexSpec,
currentColumnValues)) {
updateIndex(indexSpec, delete.getRow(),
currentColumnValues);
}
}
}
}
super.delete(delete, lockid, writeToWAL);
}
I'm not sure if I've got this right but it seems that any delete will
remove the indexes, but they will only be rebuilt if the delete is
of a
previous version for the row, and then the index will then be built
using
data from the version prior to that which you've just deleted -
which seems
to mean it would, more often than not, always be out of date.
More broadly it also occurs to me that it may make sense not to
delete the
indexes at all unless the Delete would otherwise affect them. In my
case
there isn't really any reason to remove the indexes, the column I'm
deleting
is completely unrelated.
Cheers,
Andrew