Re: Why are TXN IDs not partitioned per database?

Granville Barnett Tue, 20 Nov 2018 12:18:41 -0800

Thanks Alan.

On Tue, 20 Nov 2018, 17:23 Alan Gates <[email protected] wrote:


> History.  Originally there were only transaction ids, which were global.
> Write ids for tables came later as a way to limit the amount of information
> each transaction needed to track and to make it easier to replicate table
> changes between Hive instances.
>
> But even if we had put them in from the start, we'd have them span
> databases, otherwise transactions couldn't span databases.  Hive has no
> restrictions on queries spanning databases so we wouldn't want to restrict
> transactions from doing so.
>
> Alan.
>
> On Tue, Nov 20, 2018 at 7:32 AM Granville Barnett <
> [email protected]> wrote:
>
> > Hi,
> >
> > Reading the source code of Hive 3.x and I have a question regarding
> > transaction IDs which form the span of a transaction: it's begin (TXN ID)
> > and commit ID (NEXT_TXN_ID at time of commit).
> >
> > Why is it that we have a global timeline for transactions rather than a
> > timeline partitioned at the granularity of a database, kind of similar to
> > how write IDs are partitioned per table but at the database scope?
> >
> > E.g.,
> >
> > NEXT_TXN_ID
> > +-------+-------------------+
> > | DB    | NTXN_NEXT  |
> > +-------+-------------------+
> > | test1 | 23                   |
> > | test2 | 4                     |
> > +-------+-------------------+
> >
> > Same question could also be applied to NEXT_LOCK_ID.
> >
> > I am just curious because it seems like partitioning the transaction (and
> > lock IDs) would reduce the granularity of locking in the various
> > transactional methods. For example, openTxn invocations are mutexed with
> > all other openTxn invocations even if they are for transactions running
> in
> > distinct database domains.  Similarly for openTxn mutexing with respect
> to
> > commitTxn if there is a write-write conflict, which I would have thought
> > would only be the case if they are applicable to the same database. I'm
> > sure that this would have the side effect of increasing the complexity of
> > other subsystems but I had to ask what the rationale was behind this.
> >
> > (I'm new to Hive to please forgive me if the answer is obvious.)
> >
> > Regards,
> >
> > Granville
> >
>

Re: Why are TXN IDs not partitioned per database?

Reply via email to