On Mon, 12 Nov 2012, Jeff King wrote:
> On Mon, Nov 12, 2012 at 11:27:14PM +0100, Jean-Jacques Lafay wrote:
> > 2012/11/11 Jeff King <p...@peff.net>:
> > > On Sun, Nov 11, 2012 at 05:46:32PM +0100, René Scharfe wrote:
> > >
> > > Ultimately, I have some ideas for doing this in a breadth-first way,
> > > which would make it more naturally iterative. It would involve
> > > having N bits of storage per commit to check N tags, but it would
> > > mean that we could get accurate answers in the face of clock skew
> > > (like the merge-base calculation, it would merely get slower in the
> > > face of skew).
> > I guess the optimal algorithm may also depend on the commit graph
> > general shape, but intuitively, I'd say that the critical factor is
> > the number and distribution of tags. As soon as you have a significant
> > number of tags (let's say 1% of the commits are tagged, evenly
> > distributed), you'll quickly end up with every commit marked as
> > containing or not the target commit, so that each additional tag check
> > is cheap.
> > This suggests a complexity of O(number of commits) more often then
> > not, however you choose to traverse the graph.
> We can do much better than O(number of commits), though, if we stop
> traversing down a path when its timestamp shows that it is too old to
> contain the commits we are searching for. The problem is that the
> timestamps cannot always be trusted, because they are generated on
> machines with wrong clocks, or by buggy software. This could be solved
> by calculating and caching a "generation" number, but last time it was
> discussed there was a lot of arguing and nothing got done.
Sadly, not only machines with skewed clocks, but in particular buggy
3rd-party SCMs make this more than just problematic. In a git-svn clone
that was used as base for heavy Git development, I encountered quite a lot
of Jan 1, 1970 commits.
It just cannot be helped, we must distrust timestamps completely.