On Fri, 2014-05-09 at 00:08 -0700, David Lang wrote:
> On Thu, 8 May 2014, Sebastian Schuberth wrote:
> > On 03.05.2014 05:40, Felipe Contreras wrote:
> >>>> That's very interesting. Do you get similar improvements when doing
> >>>> something similar in Merurial (watchman vs . no watchman).
> >>> I have not tried it. My understanding is that this is why Facebook
> >>> wrote Watchman and added support for it to Mercurial, so I would assume
> >>> that the improvements are at least this good.
> >> Yeah, my bet is that they are actually much better (because Mercurial
> >> can't be so optimized as Git).
> >> I'm interested in this number because if watchman in Git is improving it
> >> by 30%, but in Mercurial it's improving it by 100% (made up number),
> >> therefore it makes sens that you might want it more if you are using hg,
> >> but not so much if you are using git.
> >> Also, if similar repositories with Mercurial+watchman are actually
> >> faster than Git+watchman, that means that there's room for improvement
> >> in your implementation. This is not a big issue at this point of the
> >> process, just something nice to know.
> > The article at  has some details, they claim "For our repository,
> > enabling Watchman integration has made Mercurial's status command more than
> > 5x faster than Git's status command".
> > 
> > https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/
> a lot of that speed comparison is going to depend on your storage system and
> size of your repository.
> if you have a high-end enterprise storage system that tracks metadata very
> differently from the file contents (I've seen some that have rackes worth of
> SATA drives for contents and then 'small' arrays of a few dozen flash drives
> the metadata), and then you have very large repositories (Facebook has
> everything in a single repo), then you have a perfect storm where something
> watchman that talks the proprietary protocol of the storage array can be FAR
> faster than anything that needs to operate with the standard POSIX calls.
> That can easily account for the difference between the facebook announcement
> the results presented for normal disks that show an improvement, but with
> stock git being faster than improved mercurial.
As I recall from Facebook's presentation on this (as well as from the
discussion on the git mailing list), Facebook's test respository is
much larger than any known git repository. In particular, it is larger
than WebKit. These performance improvements are not for server-side
tasks, but for client-side (e.g. git/hg status). Facebook also made
other improvements for the client-server communication, and for
log/blame, but these are not relevant to watchman.
It is entirely possible that, as repo size grows, Mercurial with
watchman is faster than git without.
With my patches, git status isn't constant-time; it's merely a roughly
constant factor faster. My initial design was to make git status
constant-time by caching the results of the wt_status_collect calls.
But there were so many cases with the various options that I got a bit
lost in the wilderness and made a big mess. Maybe I would do better if I
tried it again today. And maybe if I just build on top of the
untracked-cache code, I would be able to get to constant-time; I'll have
to try that at some point.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html