Re: Txn discussion

Emmanuel Lécharny Sun, 10 Jun 2012 02:00:19 -0700

Le 6/10/12 2:48 AM, Selcuk AYA a écrit :

How
will this be different from what we are trying to implement now? We
still need a WAL log keeping track of txns on top of B+ trees, changes
could be kept track of in terms of pages or entries and indices. Old
version of data has to be copied over to some other location before
newer version can overwrite it or newer version has to be kept at
location X as long as readers need the old data. Any MVCC system has
to do something like this.

No, we don't need all this mechanism if we block all the modifications while
a modification is being processed. I agree that modifications will be
slower, but this is a price I want to pay if, at the same time, I can
guarantee consistant *and* concurrent reads.

you have a single modification that touches a couple of entries and
indices, how will reads proceed concurrently if the ongoing
modification does not pay attention to not overwriting the versions
the reads are using ?

Because when the read starts, it uses the latest existing revision ofthe index used to fetch the entries. We should get the current revisionwhen the read starts for each of the index it will use. Currently, as weuse reverse indexes from potentially many indices, that will imply weintroduce a protected section in the read that fetches all the validversions for all the uses indices. We can have a data structure thatcontains those versions which is updated atomically by a modification,so that the searches don't have to take care. When a modificationstarts, it copies this data structure, do its update, and at the end, ifeverything went fine, update the data structure with the new revisionsfor all the tables.


Also think of adding another partition tomorrow. Say HBASE partition
is added which exposes atomic writes and atomic reads or scan
consistent scans. If we plug that partition with what we are
implementing right now, txns over HBASE partitions would just work
without much effort.

Yes. What you have written is also a way to keep partition dumb. What I'm
suggesting forces you to have MVCC copable partitions, which is a real
hassle. Now, let's face it : do we need anything else, atm ? Plus HBase
already implement a similar system to protect reads against conncurrent
modifications, so we don't necessarily need to have it.
Also keep in mind that if we want to implement the solution I proposed, we
still need to modify the code to protect the partitions against concurrent
modifications, and to leverage the MVCC parts in JDBM (and probably write
the versions on disk too).

no. HBASE is not transactional. You still need transactions to make
queries consistent.

HBase now supports multi-row transactions :http://hadoop-hbase.blogspot.fr/2012/03/acid-in-hbase.html. I guess thisis what we need.



--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Txn discussion

Reply via email to