Hi all,
I've been reading the code of secondary index recently and i found it very
hard to understand. Here are some questions:
1. there are 5 classes defined in package
'org.apache.phoenix.hbase.index.covered.example', but it seems that these
classes are only referenced in tests.
If that's true, then why not putting them into IT/test directory?
If not, then what are they used for?
2. class IndexMemStore.
I read the comment at the header of this class many times but I still
cannot get the point. What is the 'out-of-order' scenario?
I read the comment of CoveredColumnIndexer too, it might have showed me
an 'example' of this scenario. The comments:
Taking the simple case, assume we do a single column in a group. Then if we
get an out of order
update, we need to check the current state of that column in the current row.
If the current row
is older, we can issue a delete as normal. If the current row is newer,
however, we then have to
issue a delete for the index update at the time of the current row. This
ensures that the index
update made for the 'future' time still covers the existing row.
So, If I delete an existing row of the data table with ts = 10, while the
existing row has a ts of 20 which is 'newer' than the current operation, then,
we call the current Delete operation is 'back-in-time' or 'out-of-order'? What
makes me confused is the solution of this scenario: just issue the delete with
the ts of the existing row, which means issuing a Delete with ts = 20 ? Am i
right?
In my opinion, if a Delete is back in time, we can just ignore it or issue
an index Delete simply with the same ts. Why are we using such a complex way
to generating the index update?
The 'roll back' operation in NonTxIndexBuilder, and
IndexUpdateManager#fixUpCurrentUpdates(), I cannot see the purpose of these
facilities. I think I must have missed something very important, which might
be some core concept or design. May someone provide me an easier way to
understand these code?
Thanks.
William