Le 6/10/12 2:48 AM, Selcuk AYA a écrit :
How
will this be different from what we are trying to implement now? We
still need a WAL log keeping track of txns on top of B+ trees, changes
could be kept track of in terms of pages or entries and indices. Old
version of data has to be copied over to some other location before
newer version can overwrite it or newer version has to be kept at
location X as long as readers need the old data. Any MVCC system has
to do something like this.
No, we don't need all this mechanism if we block all the modifications while
a modification is being processed. I agree that modifications will be
slower, but this is a price I want to pay if, at the same time, I can
guarantee consistant *and* concurrent reads.
you have a single modification that touches a couple of entries and
indices, how will reads proceed concurrently if the ongoing
modification does not pay attention to not overwriting the versions
the reads are using ?
Because when the read starts, it uses the latest existing revision of
the index used to fetch the entries. We should get the current revision
when the read starts for each of the index it will use. Currently, as we
use reverse indexes from potentially many indices, that will imply we
introduce a protected section in the read that fetches all the valid
versions for all the uses indices. We can have a data structure that
contains those versions which is updated atomically by a modification,
so that the searches don't have to take care. When a modification
starts, it copies this data structure, do its update, and at the end, if
everything went fine, update the data structure with the new revisions
for all the tables.
Also think of adding another partition tomorrow. Say HBASE partition
is added which exposes atomic writes and atomic reads or scan
consistent scans. If we plug that partition with what we are
implementing right now, txns over HBASE partitions would just work
without much effort.
Yes. What you have written is also a way to keep partition dumb. What I'm
suggesting forces you to have MVCC copable partitions, which is a real
hassle. Now, let's face it : do we need anything else, atm ? Plus HBase
already implement a similar system to protect reads against conncurrent
modifications, so we don't necessarily need to have it.
Also keep in mind that if we want to implement the solution I proposed, we
still need to modify the code to protect the partitions against concurrent
modifications, and to leverage the MVCC parts in JDBM (and probably write
the versions on disk too).
no. HBASE is not transactional. You still need transactions to make
queries consistent.
HBase now supports multi-row transactions :
http://hadoop-hbase.blogspot.fr/2012/03/acid-in-hbase.html. I guess this
is what we need.
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com