Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Mon, Jul 01, 2013 at 11:47:32AM -0700, Colby Ranger wrote: But I think we are comparing apples to steaks here, Vincent is (rightfully) concerned about process startup performance, whereas our timings were assuming the process was already running. I did some timing on loading the reverse index for the kernel and it is pretty slow (~1200ms). I just submitted a fix to do a bucket sort and reduced that to ~450ms, which is still slow but much better: On my machine, loading the kernel revidx in C git is about ~830ms. I switched the qsort() call to a radix/bucket sort, and have it down to ~200ms. So definitely much better, though that still leaves a bit to be desired for quick commands. E.g., git rev-list --count A..B should become fairly instantaneous with bitmaps, but in many cases the revindex loading will take longer than it would have to simply do the actual traversal. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Sun, Jul 7, 2013 at 2:46 AM, Jeff King p...@peff.net wrote: On Mon, Jul 01, 2013 at 11:47:32AM -0700, Colby Ranger wrote: But I think we are comparing apples to steaks here, Vincent is (rightfully) concerned about process startup performance, whereas our timings were assuming the process was already running. I did some timing on loading the reverse index for the kernel and it is pretty slow (~1200ms). I just submitted a fix to do a bucket sort and reduced that to ~450ms, which is still slow but much better: On my machine, loading the kernel revidx in C git is about ~830ms. I switched the qsort() call to a radix/bucket sort, and have it down to ~200ms. So definitely much better, This is a very nice reduction. pack-objects would benefit from it even without bitmaps. Since it doesn't require a data format change this is a pretty harmless patch to include in Git. We may later conclude caching the revidx is worthwhile, but until then a bucket sort doesn't hurt. :-) though that still leaves a bit to be desired for quick commands. E.g., git rev-list --count A..B should become fairly instantaneous with bitmaps, but in many cases the revindex loading will take longer than it would have to simply do the actual traversal. Yea, we don't know of a way around this. In a few cases the bitmap code in JGit is slower than the naive traversal, but these are only on small segments of history. I wonder if you could guess which algorithm to use by looking at the offsets of A and B using the idx file. If they are near each other in the pack, run the naive algorithm without bitmaps and revidx. If they are farther apart assume the bitmap would help more than traversal and use bitmap+revidx. Working out what the correct distance should be before switching algorithms is hard. A and B could be megabytes apart in the pack but A could be B's grandparent and traversed in milliseconds. I wonder how often that is in practice, certainly if A and B are within a few hundred kilobytes of each other the naive traversal should be almost instant. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Right, the format and implementation in JGit can do Counting objects in 87ms for the Linux kernel history. Actually, that was the timing when I first pushed the change. With the improvements submitted throughout the year, we can do counting in 50ms, on my same machine. But I think we are comparing apples to steaks here, Vincent is (rightfully) concerned about process startup performance, whereas our timings were assuming the process was already running. I did some timing on loading the reverse index for the kernel and it is pretty slow (~1200ms). I just submitted a fix to do a bucket sort and reduced that to ~450ms, which is still slow but much better: https://eclipse.googlesource.com/jgit/jgit/+/6cc532a43cf28403cb623d3df8600a2542a40a43%5E%21/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Mon, Jul 1, 2013 at 11:47 AM, Colby Ranger cran...@google.com wrote: But I think we are comparing apples to steaks here, Vincent is (rightfully) concerned about process startup performance, whereas our timings were assuming the process was already running. I did some timing on loading the reverse index for the kernel and it is pretty slow (~1200ms). I just submitted a fix to do a bucket sort and reduced that to ~450ms, which is still slow but much better: https://eclipse.googlesource.com/jgit/jgit/+/6cc532a43cf28403cb623d3df8600a2542a40a43%5E%21/ A reverse index that is hot in RAM would obviously load in about 0ms. But a cold load of a reverse index that uses only 4 bytes per object (as Colby did here) for 3.1M objects could take ~590ms to read from disk, assuming spinning media moving 20 MiB/s. If 8 byte offsets were also stored this could be more like 1700ms. Numbers obviously get better if the spinning media can transfer at 40 MiB/s, now its more like 295ms for 4 bytes/object and 885ms for 12 bytes/object. I think its still reasonable to compute the reverse index on the fly. But JGit certainly does have the benefit of reusing it across requests by relying on process memory based caches. C Git needs to rely on the kernel buffer cache, which requires this data be written out to a file to be shared. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Wed, Jun 26, 2013 at 7:45 PM, Jeff King p...@peff.net wrote: In particular, it seems like the slowness we saw with the v1 bitmap format is not what Shawn and Colby have experienced. So it's possible that our test setup is bad or different. Or maybe the C v1 reading implementation had some problems that are fixable. It's hard to say because we haven't shown any code that can be timed and compared. Right, the format and implementation in JGit can do Counting objects in 87ms for the Linux kernel history. But I think we are comparing apples to steaks here, Vincent is (rightfully) concerned about process startup performance, whereas our timings were assuming the process was already running. It would help everyone to understand the issues involved if we are at least looking at the same file format. And the pack-order versus idx-order for the bitmaps is still up in the air. Do we have numbers on the on-disk sizes of the resulting EWAHs? I did not see any presented in this thread, and I am very interested in this aspect of the series. The path hash cache should be taking about 9.9M of disk space, but I recall reading the bitmap file is 8M. I don't understand. Colby and I were very concerned about the size of the EWAH compressed bitmaps because we wanted hundreds of them for a large history like the kernel, and we wanted to minimize the amount of memory consumed by the bitmap index when loaded into the process. In the JGit implementation our copy of Linus' tree has 3.1M objects, an 81.5 MiB idx file, and a 3.8 MiB bitmap file. We were trying to keep the overhead below 10% of the idx file, and I think we have succeeded on that. With 3.1M objects the v2 bitmap proposed in this thread needs at least 11.8M, or 14+% overhead just for the path hash cache. The path hash cache may still be required, Colby and I have been debating the merits of having the data available for delta compression vs. the increase in memory required to hold it. The pack-order ones should be more amenable to run-length encoding, especially as you get further down into history (the tip ones would mostly be 1's, no matter how you order them). This is also true for the type bitmaps, but especially so in the JGit file ordering where we always write all trees before any blobs. The type bitmaps are very compact and basically amount to defining a single range in the file. This takes only a few words in the EWAH compressed format. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Thu, Jun 27, 2013 at 09:07:38AM -0700, Shawn O. Pearce wrote: And the pack-order versus idx-order for the bitmaps is still up in the air. Do we have numbers on the on-disk sizes of the resulting EWAHs? I did not see any presented in this thread, and I am very interested in this aspect of the series. The path hash cache should be taking about 9.9M of disk space, but I recall reading the bitmap file is 8M. I don't understand. I don't know there the 8M number came from, or if it was on the kernel repo. My bitmap-enabled pack of linux-2.6 (about 3.2M objects) using Vicent's patches looks like: $ du -sh * 42M pack-9ea76831aec6c49c5ff42509a2a2ce97da13c5ad.bitmap 87M pack-9ea76831aec6c49c5ff42509a2a2ce97da13c5ad.idx 630Mpack-9ea76831aec6c49c5ff42509a2a2ce97da13c5ad.pack Packing the same repo with jgit debug-gc (jgit 3.0.0) yields: $ du -sh * 3.0Mpack-2478783825733a1f1012f0087a0b5a92aa7437d8.bitmap 82M pack-2478783825733a1f1012f0087a0b5a92aa7437d8.idx 585Mpack-2478783825733a1f1012f0087a0b5a92aa7437d8.pack 4.8Mpack-f61fb76112372288923be7a0464476892dfebe3e.idx 97M pack-f61fb76112372288923be7a0464476892dfebe3e.pack If we assume that 12M of that is name-hash, that's still an order of magnitude larger. For reference, jgit created 327 bitmaps (according to its progress eye candy), and Vicent's patches generated 385. So that explains some of the increase, but the per-bitmap size is still much larger. The path hash cache may still be required, Colby and I have been debating the merits of having the data available for delta compression vs. the increase in memory required to hold it. I guess this is not an option for JGit, but for C git, an mmap-able name-hash file means we can just fault in the pages mentioning objects we actually need it for. And its use can be completely optional; in fact, it doesn't even need to be inside the .bitmap file (though I cannot think of a reason it would be useful outside of having bitmaps). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Pinning the bitmap index on the reverse index adds complexity (lookups are two-step: first find the entry in the reverse index, and then find the SHA1 in the index) and is measurably slower, in both loading and lookup times. Since Git doesn't have a memory problem, it's very hard to make an argument for design that is more complex and runs slower to save memory. Sorting by SHA1 will generate a random distribution. This will require you to inflate the entire bitmap on every fetch request, in order to do the contains operation. Sorting by pack offset allows us to inflate only the bits we need as we are walking the graph, since they are usually at the start of the bitmap. What is the general size in bytes of the SHA1 sorted bitmaps? If they are much larger, the size of the bitmap has an impact on how fast you can perform bitwise operations on them, which is important for fetch when doing wants AND NOT haves. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Pinning the bitmap index on the reverse index adds complexity (lookups are two-step: first find the entry in the reverse index, and then find the SHA1 in the index) and is measurably slower, in both loading and lookup times. Since Git doesn't have a memory problem, it's very hard to make an argument for design that is more complex and runs slower to save memory. Sorting by SHA1 will generate a random distribution. This will require you to inflate the entire bitmap on every fetch request, in order to do the contains operation. Sorting by pack offset allows us to inflate only the bits we need as we are walking the graph, since they are usually at the start of the bitmap. What is the general size in bytes of the SHA1 sorted bitmaps? If they are much larger, the size of the bitmap has an impact on how fast you can perform bitwise operations on them, which is important for fetch when doing wants AND NOT haves. Furthermore, JGit primarily operates on the bitmap representation, rarely converting bitmap id - SHA1 during clone. When the bitmap of objects to include in the output pack contains all of the objects in the bitmap'd pack, we only do the translation of the bitmap ids of new objects, not in the bitmap index, and it is just a lookup in an array. Those objects are put at the front of the stream. The rest of the objects are streamed directly from the pack, with some header munging, since it is guaranteed to be a fully connected pack. Most of the time this works because JGit creates 2 packs during GC: a heads pack, which is bitmap'd, and an everything else pack. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Vicent Martí tan...@gmail.com writes: On Tue, Jun 25, 2013 at 5:58 PM, Thomas Rast tr...@inf.ethz.ch wrote: Please document the RLW format here. Har har. I was going to comment on your review of the Ewah patchset, but might as well do it here: the only thing I know about Ewah bitmaps is that they work. And I know this because I did extensive fuzz testing of my C port. Unfortunately, the original Java code I ported from has 0 comments, so any documentation here would have to be reverse-engineered. I think the below would be a reasonable documentation, to be appended after your description of the EWAH format. Maybe Colby can correct me if I got anything wrong. You can basically read this off from the implementation of ewah_each_bit() and the helper functions it uses. -- 8 -- The compressed bitmap is stored in a form of run-length encoding, as follows. It consists of a concatenation of an arbitrary number of chunks. Each chunk consists of one or more 64-bit words H L_1 L_2 L_3 L_M H is called RLW (run length word). It consists of (from lower to higher order bits): - 1 bit: the repeated bit B - 32 bits: repetition count K (unsigned) - 31 bits: literal word count M (unsigned) The bitstream represented by the above chunk is then: - K repetitions of B - The bits stored in `L_1` through `L_M`. Within a word, bits at lower order come earlier in the stream than those at higher order. The next word after `L_M` (if any) must again be a RLW, for the next chunk. For efficient appending to the bitstream, the EWAH stores a format to the last RLW in the stream. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Thomas Rast tr...@inf.ethz.ch writes: [...] The next word after `L_M` (if any) must again be a RLW, for the next chunk. For efficient appending to the bitstream, the EWAH stores a format to the last RLW in the stream. ^^ I have no idea what Freud did there, but pointer or some such is probably a saner choice. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
+ Generating this reverse index at runtime is **not** free (around 900ms + generation time for a repository like `torvalds/linux`), and once again, + this generation time needs to happen every time `pack-objects` is + spawned. If generating the reverse index is expensive, it is probably worthwhile to create a .revidx or extend the .idx with the information sorted by offset. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Tue, Jun 25, 2013 at 4:08 PM, Vicent Martí tan...@gmail.com wrote: On Tue, Jun 25, 2013 at 11:17 PM, Junio C Hamano gits...@pobox.com wrote: What case are you talking about? The n-th object must be one of these four types and can never be of more than one type at the same time, so a natural expectation from the reader is If you OR them together, you will get the same set. If you say If you XOR them, that forces the reader to wonder when these bitmaps ever can overlap at the same bit position. I guess this is just wording. I don't particularly care about the distinction, but I'll change it to OR. Hmm, OK. If you think XOR and OR are the same operation, I also have a bridge to sell you. Its in Brooklyn. Its a great value. The correct operation is OR. Not XOR. OR. Drop the X. It cannot be mmapped not particularly because of endianness issues, but because the original format is not indexed and requires a full parse of the whole index before it can be accessed programatically. The wrong endianness just increases the parse time. Wrong endianness has nothing to do with the parse time. Modern CPUs can flip a word around very quickly. In JGit we chose to parse the file at load time because its simpler than having an additional index segment, and we do what you did which is to toss the object SHA-1s into a hashtable for fast lookup. By the time we look for the SHA-1s and toss them into a hashtable we can stride through the file and find the bitmap regions. Simple. In other words, the least complex solution possible that still provides good performance. I'd say we have pretty good performance. and I'm going to try to make it run fast enough in that encoding. Hmph. Is it an option to start from what JGit does, so that people can use both JGit and your code on the same repository? I'm afraid I agree here with Junio. The JGit format is already shipping in JGit 3.0, Gerrit Code Review 2.6, and in heavy production use for almost a year on android.googlesource.com, and Google's own internal Git trees. I would prefer to see a series adding bitmap support to C Git start with the existing format, make it run, taking advantage of the optimizations JGit uses (many of which you ignored and tried to fix in other ways), and then look at improving the file format itself if load time is still the largest low hanging fruit in upload-pack. I'm guessing its not. You futzed around with the object table, but JGit sped itself up considerably by simply not using the object table when the bitmap is used. I think there are several such optimizations you missed in your rush to redefine the file format. And then if you do not succeed, after trying to optimize in-core processing using that on-disk format to make it fast enough, start thinking about tweaking the on-disk format? I'm afraid this is not an option. I have an old patchset that implements JGit v1 bitmap loading (and in fact that's how I initially developed these series -- by loading the bitmaps from JGit for debugging), but I discarded it because it simply doesn't pan out in production. ~3 seconds time to spawn `upload-pack` is not an option for us. I did not develop a tweaked on-disk format out of boredom. I think your code or experiments are bogus. Even on our systems with JGit a cold start for the Linux kernel doesn't take 3s. And this is JGit where Java is slow because Jesus it has a lot of factories, and without mmap'ing the file into the server's address space. Hell the file has to come over the network from a remote disk array. I could dig up the patch if you're particularly interested in backwards compatibility, but since it was several times slower than the current iteration, I have no interest (time, actually) to maintain it, brush it up, and so on. I have already offered myself to port the v2 format to JGit as soon as it's settled. It sounds like a better investment of all our times. Actually, I think the format you propose here is inferior to the JGit format. In particular the idx-ordering means the EWAH code is useless. You might as well not use the EWAH format and just store 2.6M bits per commit. The idx-ordering also makes *much* harder to emit a pack file a reasonable order for the client. Colby and I tried idx-ordering and discarded it when it didn't perform as well as the pack-ordering that JGit uses. Following up on Shawn's comments, I removed the little-endian support from the on-disk format and implemented lazy loading of the bitmaps to make up for it. The result is decent (slowed down from 250ms to 300ms) and it lets us keep the whole format as NWO on disk. I think it's a good tradeback. The maintenance burden of two endian formats in a single file is too high to justify. I'm glad to see you saw that. As it stands right now, the only two changes from v1 of the on-disk format are: - There is an index at the end. This is a good idea. I don't think the index is necessary if you plan to build a
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Tue, Jun 25, 2013 at 11:11 PM, Jeff King p...@peff.net wrote: On Tue, Jun 25, 2013 at 09:33:11PM +0200, Vicent Martí wrote: One way we side-stepped the size inflation problem in JGit was to only use the bitmap index information when sending data on the wire to a client. Here delta reuse plays a significant factor in building the pack, and we don't have to be as accurate on matching deltas. During the equivalent of `git repack` bitmaps are not used, allowing the traditional graph enumeration algorithm to generate path hash information. OH BOY HERE WE GO. This is worth its own thread, lots to discuss here. I think peff will have a patchset regarding this to upstream soon, we'll get back to it later. We do the same thing (only use bitmaps during on-the-wire fetches). But there a few problems with assuming delta reuse. For us (GitHub), the foremost one is that we pack many forks of a repository together into a single packfile. That means when you clone torvalds/linux, an object you want may be stored in the on-disk pack with a delta against an object that you are not going to get. So we have to throw out that delta and find a new one. Gerrit Code Review ran into the same problem a few years ago with the refs/changes namespace. Objects reachable from a branch were often delta compressed against dropped code review revisions, making for some slow transfers. We fixed this by creating a pack of everything reachable from refs/heads/* and then another pack of the other stuff. I would encourage you to do what you suggest... I'm dealing with that by adding an option to respect islands during packing, where an island is a set of common objects (we split it by fork, since we expect those objects to be fetched together, but you could use other criteria). The rule is that an object cannot delta against another object that is not in all of its islands. So everybody can delta against shared history, but objects in your fork can only delta against other objects in the fork. You are guaranteed to be able to reuse such deltas during a full clone of a fork, and the on-disk pack size does not suffer all that much (because there is usually a good alternate delta base within your reachable history). Yes, exactly. I want to do the same thing on our servers, as we have many forks of some popular open source repositories that are also not small (Linux kernel, WebKit). Unfortunately Google has not had the time to develop the necessary support into JGit. So with that series, we can get good reuse for clones. But there are still two cases worth considering: 1. When you fetch a subset of the commits, git marks only the edges as preferred bases, and does not walk the full object graph down to the roots. So any object you want that is delta'd against something older will not get reused. If you have reachability bitmaps, I don't think there is any reason that we cannot use the entire object graph (starting at the have tips, of course) as preferred bases. In JGit we use the reachability bitmap to provide proof a client has an object. Even if its not in the edges. This allows us much better delta reuse, as often frequently deltas will be available pointing to something behind the edge, but that the client certainly has given the edges we know about. We also use the reachability bitmap to provide proof a client does not need an object. We found a reduction in number of objects transferred because the want AND NOT have subtracted out a number of objects not in the edge. Apparently merges, reverts and cherry-picks happen often enough in the repositories we host that this particular optimization helps reduce data transfer, and work at both server and client ends of the connection. Its a nice freebie the bitmap algorithm gives us. 2. The server is not necessarily fully packed. In an active repo, you may have a large base pack with bitmaps, with several recently pushed packs on top. You still need to delta the recently pushed objects against the base objects. Yes, this is unfortunate. One way we avoid this in JGit is to keep everything in pack files, rather than exploding loose. The reachability bitmap often proves the client has the delta base the pusher used to make the object, allowing us to reuse the delta. It may not be the absolute best delta in the world, but reuse is faster than inflate()+delta()+deflate(), and the delta is probably good enough until the server can do a real GC in the background. We combine small packs from pushes together by almost literally just concat'ing the packs together and creating a new .idx. Newer pushed data is put in front of the older data, the pack is clustered by commit, tree, blob ordering, duplicates are removed, and its written back to disk. Typically we complete this pack concat operation mere seconds after a push finishes, so readers have very few packs to deal with. I don't have
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Wed, Jun 26, 2013 at 6:53 PM, Colby Ranger cran...@google.com wrote: + Generating this reverse index at runtime is **not** free (around 900ms + generation time for a repository like `torvalds/linux`), and once again, + this generation time needs to happen every time `pack-objects` is + spawned. 900ms is fishy. Creating the revidx should not take that long. But if it is... If generating the reverse index is expensive, it is probably worthwhile to create a .revidx or extend the .idx with the information sorted by offset. Colby is probably right that a cached copy of the revidx would help. Or update the .idx format to have two additional sections that stores the length of each packed object and the delta base of each packed object, allowing pack-objects to avoid creating the revidx. This would be an additional ~8 bytes per object, so ~19.8M for the Linux kernel (given ~2.6M objects). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
That was a very rude reply. :( Please refrain from interacting with me in the ML in the future. I'l do accordingly. Thanks! vmg On Thu, Jun 27, 2013 at 3:11 AM, Shawn Pearce spea...@spearce.org wrote: On Tue, Jun 25, 2013 at 4:08 PM, Vicent Martí tan...@gmail.com wrote: On Tue, Jun 25, 2013 at 11:17 PM, Junio C Hamano gits...@pobox.com wrote: What case are you talking about? The n-th object must be one of these four types and can never be of more than one type at the same time, so a natural expectation from the reader is If you OR them together, you will get the same set. If you say If you XOR them, that forces the reader to wonder when these bitmaps ever can overlap at the same bit position. I guess this is just wording. I don't particularly care about the distinction, but I'll change it to OR. Hmm, OK. If you think XOR and OR are the same operation, I also have a bridge to sell you. Its in Brooklyn. Its a great value. The correct operation is OR. Not XOR. OR. Drop the X. It cannot be mmapped not particularly because of endianness issues, but because the original format is not indexed and requires a full parse of the whole index before it can be accessed programatically. The wrong endianness just increases the parse time. Wrong endianness has nothing to do with the parse time. Modern CPUs can flip a word around very quickly. In JGit we chose to parse the file at load time because its simpler than having an additional index segment, and we do what you did which is to toss the object SHA-1s into a hashtable for fast lookup. By the time we look for the SHA-1s and toss them into a hashtable we can stride through the file and find the bitmap regions. Simple. In other words, the least complex solution possible that still provides good performance. I'd say we have pretty good performance. and I'm going to try to make it run fast enough in that encoding. Hmph. Is it an option to start from what JGit does, so that people can use both JGit and your code on the same repository? I'm afraid I agree here with Junio. The JGit format is already shipping in JGit 3.0, Gerrit Code Review 2.6, and in heavy production use for almost a year on android.googlesource.com, and Google's own internal Git trees. I would prefer to see a series adding bitmap support to C Git start with the existing format, make it run, taking advantage of the optimizations JGit uses (many of which you ignored and tried to fix in other ways), and then look at improving the file format itself if load time is still the largest low hanging fruit in upload-pack. I'm guessing its not. You futzed around with the object table, but JGit sped itself up considerably by simply not using the object table when the bitmap is used. I think there are several such optimizations you missed in your rush to redefine the file format. And then if you do not succeed, after trying to optimize in-core processing using that on-disk format to make it fast enough, start thinking about tweaking the on-disk format? I'm afraid this is not an option. I have an old patchset that implements JGit v1 bitmap loading (and in fact that's how I initially developed these series -- by loading the bitmaps from JGit for debugging), but I discarded it because it simply doesn't pan out in production. ~3 seconds time to spawn `upload-pack` is not an option for us. I did not develop a tweaked on-disk format out of boredom. I think your code or experiments are bogus. Even on our systems with JGit a cold start for the Linux kernel doesn't take 3s. And this is JGit where Java is slow because Jesus it has a lot of factories, and without mmap'ing the file into the server's address space. Hell the file has to come over the network from a remote disk array. I could dig up the patch if you're particularly interested in backwards compatibility, but since it was several times slower than the current iteration, I have no interest (time, actually) to maintain it, brush it up, and so on. I have already offered myself to port the v2 format to JGit as soon as it's settled. It sounds like a better investment of all our times. Actually, I think the format you propose here is inferior to the JGit format. In particular the idx-ordering means the EWAH code is useless. You might as well not use the EWAH format and just store 2.6M bits per commit. The idx-ordering also makes *much* harder to emit a pack file a reasonable order for the client. Colby and I tried idx-ordering and discarded it when it didn't perform as well as the pack-ordering that JGit uses. Following up on Shawn's comments, I removed the little-endian support from the on-disk format and implemented lazy loading of the bitmaps to make up for it. The result is decent (slowed down from 250ms to 300ms) and it lets us keep the whole format as NWO on disk. I think it's a good tradeback. The maintenance burden of two endian formats in a single file is too high to
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Thu, Jun 27, 2013 at 04:36:54AM +0200, Vicent Martí wrote: That was a very rude reply. :( Please refrain from interacting with me in the ML in the future. I'l do accordingly. I agree that the pointer arithmetic thing may have been a little much, but I think there are some points we need to address in Shawn's email. In particular, it seems like the slowness we saw with the v1 bitmap format is not what Shawn and Colby have experienced. So it's possible that our test setup is bad or different. Or maybe the C v1 reading implementation had some problems that are fixable. It's hard to say because we haven't shown any code that can be timed and compared. And the pack-order versus idx-order for the bitmaps is still up in the air. Do we have numbers on the on-disk sizes of the resulting EWAHs? The pack-order ones should be more amenable to run-length encoding, especially as you get further down into history (the tip ones would mostly be 1's, no matter how you order them). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Tue, Jun 25, 2013 at 7:42 AM, Shawn Pearce spea...@spearce.org wrote: I very much hate seeing a file format that is supposed to be portable that supports both big-endian and little-endian encoding. Well, the bitmap index is not supposed to be portable, as it doesn't get sent over the wire in any situation. Regardless, the format is portable because it supports both encodings and clearly defines which one the current file is using. I think that's a good tradeoff! Such a specification forces everyone to implement two code paths to handle reading data from the file, on the off-chance they are on the wrong platform. Extra code paths have never been an issue in the JGitAbstractFactoryGeneratorOfGit, har har har. Ah. I'm such a funny guy when it comes to Java. Anyway, I designed this keeping JGit in mind. In this specific case, it doesn't force you to add any new code paths. The endianness changes only affect the serialization format of the bitmaps, which is not part of Git or JGit itself but of the Javaewah/libewok library. The interface for reading on that library has already been wisely abstracted on JGit (https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/PackBitmapIndexV1.java#L133), so changing the byte order simply means changing the SimpleDataInput to a LE one. I agree this is not ideal, or elegant, but I'm having a hard time making an argument for sacrificing objective speed for the sake of subjective simplicity. What is wrong with picking one encoding and sticking to it? It prevents us from making this optimally fast on the machines where it needs to be. Regardless, I must admit I haven't generated numbers for this in a while (the BE-LE switch is one of the first optimizations I did). I'm going to try to re-implement full NWO loading and see how much slower it is, before I continue arguing for/against it. If I can get it within a reasonable margin (say, 15%) of the current implementation, I'd definitely be in favor of sticking to only NWO on the whole file. If it's slower than that, well, Git has never compromised on speed, and I don't think there's a point to be made for starting to do that now. + - BITMAP_OPT_HASH_CACHE (0x8) + If present, a hash cache for finding delta bases will be available + right after the header block in this index. See the following + section for details. + + 4-byte entry count (network byte order) + + The total count of entries (bitmapped commits) in this bitmap index. + + 20-byte checksum + + The SHA1 checksum of the pack this bitmap index belongs to. + + - An OPTIONAL delta cache follows the header. Some may find the name delta cache confusing as it does not cache deltas of objects. May I suggest path hash cache as an alternative name? Definitely, this is a typo. + The cache is formed by `n` 4-byte hashes in a row, where `n` is + the amount of objects in the indexed packfile. Note that this amount + is the **total number of objects** and is not related to the + number of commits that have been selected and indexed in the + bitmap index. + + The hashes are stored in Network Byte Order and they are the same + values generated by a normal revision walk during the `pack-objects` + phase. I find it interesting this is network byte order and not big-endian or little-endian based on the flag in the header. As stated before, the flag in the header only affects the Javaewah/libewok interface. Everything Git-related in the bitmap index is in NWO, like it is customary in Git. + The `n`nth hash in the cache is the name hash for the `n`th object + in the index for the indexed packfile. + + [RATIONALE]: + + The bitmap index allows us to skip the Counting Objects phase + during `pack-objects` and yield all the OIDs that would be reachable + (WANTS) when generating the pack. + + This optimization, however, means that we're adding objects to the + packfile straight from the packfile index, and hence we are lacking + path information for the objects that would normally be generated + during the Counting Objects phase. + + This path information for each object is hashed and used as a very + effective way to find good delta bases when compressing the packfile; + without these hashes, the resulting packfiles are much less optimal. + + By storing all the hashes in a cache together with the bitmapsin + the bitmap index, we can yield not only the SHA1 of all
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Vicent Martí tan...@gmail.com writes: + There is a bitmap for each Git object type, stored in the following + order: + + - Commits + - Trees + - Blobs + - Tags + + In each bitmap, the `n`th bit is set to true if the `n`th object + in the packfile index is of that type. + + The obvious consequence is that the XOR of all 4 bitmaps will result + in a full set (all bits sets), and the AND of all 4 bitmaps will + result in an empty bitmap (no bits set). Instead of XOR did you mean OR here? Nope, I think XOR makes it more obvious: if the same bit is set on two bitmaps, it would be cleared when XORed together, and hence all the bits wouldn't be set. An OR would hide this case. What case are you talking about? The n-th object must be one of these four types and can never be of more than one type at the same time, so a natural expectation from the reader is If you OR them together, you will get the same set. If you say If you XOR them, that forces the reader to wonder when these bitmaps ever can overlap at the same bit position. To sum it up: I'd like to see this format be strictly in Network Byte Order, Good. I've been wondering what you meant by cannot be mmap-ed from the very beginning. We mmapped the index for a long time, and it is defined in terms of network byte order. Of course, pack .idx files are in network byte order, too, and we mmap them without problems. It seems that it primarily came from your fear that using network byte order may be unnecessarily hard to perform well, and it would be a good thing to do to try to do so first instead of punting from the beginning. and I'm going to try to make it run fast enough in that encoding. Hmph. Is it an option to start from what JGit does, so that people can use both JGit and your code on the same repository? And then if you do not succeed, after trying to optimize in-core processing using that on-disk format to make it fast enough, start thinking about tweaking the on-disk format? Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
Vicent Marti tan...@gmail.com writes: This is the technical documentation and design rationale for the new Bitmap v2 on-disk format. Hrmpf, that's what I get for reading the series in order... + The folowing flags are supported: ^^ typos marked by ^ + By storing all the hashes in a cache together with the bitmapsin ^^ + The obvious consequence is that the XOR of all 4 bitmaps will result + in a full set (all bits sets), and the AND of all 4 bitmaps will ^ + - 1-byte XOR-offset + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing for this commit is composed by XORing the + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). + + Note that this compression can be recursive. In order to + XOR this entry with a previous one, the previous entry needs + to be decompressed first, and so on. + + The hard-limit for this offset is 160 (an entry can only be + xor'ed against one of the 160 entries preceding it). This + number is always positivea, and hence entries are always xor'ed ^ + with **previous** bitmaps, not bitmaps that will come afterwards + in the index. Clever. Why 160 though? + - 2 bytes of RESERVED data (used right now for better packing). What do they mean? + With an index at the end of the file, we can load only this index in memory, + allowing for very efficient access to all the available bitmaps lazily (we + have their offsets in the mmaped file). Is there anything preventing you from mmap()ing the index also? +== Appendix A: Serialization format for an EWAH bitmap + +Ewah bitmaps are serialized in the protocol as the JAVAEWAH +library, making them backwards compatible with the JGit +implementation: + + - 4-byte number of bits of the resulting UNCOMPRESSED bitmap + + - 4-byte number of words of the COMPRESSED bitmap, when stored + + - N x 8-byte words, as specified by the previous field + + This is the actual content of the compressed bitmap. + + - 4-byte position of the current RLW for the compressed + bitmap + +Note that the byte order for this serialization is not defined by +default. The byte order for all the content in a serialized EWAH +bitmap can be known by the byte order flags in the header of the +bitmap index file. Please document the RLW format here. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Tue, Jun 25, 2013 at 11:17 PM, Junio C Hamano gits...@pobox.com wrote: What case are you talking about? The n-th object must be one of these four types and can never be of more than one type at the same time, so a natural expectation from the reader is If you OR them together, you will get the same set. If you say If you XOR them, that forces the reader to wonder when these bitmaps ever can overlap at the same bit position. I guess this is just wording. I don't particularly care about the distinction, but I'll change it to OR. To sum it up: I'd like to see this format be strictly in Network Byte Order, Good. I've been wondering what you meant by cannot be mmap-ed from the very beginning. We mmapped the index for a long time, and it is defined in terms of network byte order. Of course, pack .idx files are in network byte order, too, and we mmap them without problems. It seems that it primarily came from your fear that using network byte order may be unnecessarily hard to perform well, and it would be a good thing to do to try to do so first instead of punting from the beginning. It cannot be mmapped not particularly because of endianness issues, but because the original format is not indexed and requires a full parse of the whole index before it can be accessed programatically. The wrong endianness just increases the parse time. and I'm going to try to make it run fast enough in that encoding. Hmph. Is it an option to start from what JGit does, so that people can use both JGit and your code on the same repository? And then if you do not succeed, after trying to optimize in-core processing using that on-disk format to make it fast enough, start thinking about tweaking the on-disk format? I'm afraid this is not an option. I have an old patchset that implements JGit v1 bitmap loading (and in fact that's how I initially developed these series -- by loading the bitmaps from JGit for debugging), but I discarded it because it simply doesn't pan out in production. ~3 seconds time to spawn `upload-pack` is not an option for us. I did not develop a tweaked on-disk format out of boredom. I could dig up the patch if you're particularly interested in backwards compatibility, but since it was several times slower than the current iteration, I have no interest (time, actually) to maintain it, brush it up, and so on. I have already offered myself to port the v2 format to JGit as soon as it's settled. It sounds like a better investment of all our times. Following up on Shawn's comments, I removed the little-endian support from the on-disk format and implemented lazy loading of the bitmaps to make up for it. The result is decent (slowed down from 250ms to 300ms) and it lets us keep the whole format as NWO on disk. I think it's a good tradeback. The relevant commits are available on my fork of Git (I'll be sending v2 of the patchset once I finish tackling the other reviews): https://github.com/vmg/git/commit/d6cdd4329a547580bbc0143764c726c48b887271 https://github.com/vmg/git/commit/d8ec342fee87425e05c0db1e1630db8424612c71 As it stands right now, the only two changes from v1 of the on-disk format are: - There is an index at the end. This is a good idea. - The bitmaps are sorted in packfile-index order, not in packfile order. This is a good idea. As always, all your feedback is appreciated, but please keep in mind I have strict performance concerns. German kisses, vmg -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Tue, Jun 25, 2013 at 5:58 PM, Thomas Rast tr...@inf.ethz.ch wrote: This is the technical documentation and design rationale for the new Bitmap v2 on-disk format. Hrmpf, that's what I get for reading the series in order... + The folowing flags are supported: ^^ typos marked by ^ + By storing all the hashes in a cache together with the bitmapsin ^^ + The obvious consequence is that the XOR of all 4 bitmaps will result + in a full set (all bits sets), and the AND of all 4 bitmaps will ^ + - 1-byte XOR-offset + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing for this commit is composed by XORing the + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). + + Note that this compression can be recursive. In order to + XOR this entry with a previous one, the previous entry needs + to be decompressed first, and so on. + + The hard-limit for this offset is 160 (an entry can only be + xor'ed against one of the 160 entries preceding it). This + number is always positivea, and hence entries are always xor'ed ^ + with **previous** bitmaps, not bitmaps that will come afterwards + in the index. Clever. Why 160 though? JGit implementation detail. It's the equivalent of the delta-window in `pack-objects` for example. HINT HINT: in practice, JGit only looks 16 positions behind to find deltas, and we do the same. So the practical limit is 16. harhar + - 2 bytes of RESERVED data (used right now for better packing). What do they mean? + With an index at the end of the file, we can load only this index in memory, + allowing for very efficient access to all the available bitmaps lazily (we + have their offsets in the mmaped file). Is there anything preventing you from mmap()ing the index also? Yeah, this format allows you to easily do a SHA1 bsearch with custom step to lookup entries on the bitmap index, except for the fact that the index is not sorted by SHA1, so you'd need a linear search instead. :) I decided against it because during most complex invocations of `pack-objects`, we perform a couple thousand commit lookups to see if they have a bitmap in the index, so it makes a lot of sense to load the index tightly in a hash table before hand (which takes very little time, to be fair). We more-than-make up for the loading time by having much much faster lookups. I felt it was the right tradeoff (JGit does the same, but in their case, because they cannot mmap. :p) +== Appendix A: Serialization format for an EWAH bitmap + +Ewah bitmaps are serialized in the protocol as the JAVAEWAH +library, making them backwards compatible with the JGit +implementation: + + - 4-byte number of bits of the resulting UNCOMPRESSED bitmap + + - 4-byte number of words of the COMPRESSED bitmap, when stored + + - N x 8-byte words, as specified by the previous field + + This is the actual content of the compressed bitmap. + + - 4-byte position of the current RLW for the compressed + bitmap + +Note that the byte order for this serialization is not defined by +default. The byte order for all the content in a serialized EWAH +bitmap can be known by the byte order flags in the header of the +bitmap index file. Please document the RLW format here. Har har. I was going to comment on your review of the Ewah patchset, but might as well do it here: the only thing I know about Ewah bitmaps is that they work. And I know this because I did extensive fuzz testing of my C port. Unfortunately, the original Java code I ported from has 0 comments, so any documentation here would have to be reverse-engineered. Personally, I'd lean towards considering Ewah an external dependency (black box); the headers for the library are commented accordingly, clearly explaining the interfaces while hiding implementation details. Of course, you're welcome to help me reverse engineer the implementation, but I'm not sure this would be of much value. It'd be better to make sure it passes the extensive test suite of the Java version, and assume that Mr Lemire designed a sound format for the bitmaps. Swiss kisses, vmg -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Tue, Jun 25, 2013 at 09:33:11PM +0200, Vicent Martí wrote: One way we side-stepped the size inflation problem in JGit was to only use the bitmap index information when sending data on the wire to a client. Here delta reuse plays a significant factor in building the pack, and we don't have to be as accurate on matching deltas. During the equivalent of `git repack` bitmaps are not used, allowing the traditional graph enumeration algorithm to generate path hash information. OH BOY HERE WE GO. This is worth its own thread, lots to discuss here. I think peff will have a patchset regarding this to upstream soon, we'll get back to it later. We do the same thing (only use bitmaps during on-the-wire fetches). But there a few problems with assuming delta reuse. For us (GitHub), the foremost one is that we pack many forks of a repository together into a single packfile. That means when you clone torvalds/linux, an object you want may be stored in the on-disk pack with a delta against an object that you are not going to get. So we have to throw out that delta and find a new one. I'm dealing with that by adding an option to respect islands during packing, where an island is a set of common objects (we split it by fork, since we expect those objects to be fetched together, but you could use other criteria). The rule is that an object cannot delta against another object that is not in all of its islands. So everybody can delta against shared history, but objects in your fork can only delta against other objects in the fork. You are guaranteed to be able to reuse such deltas during a full clone of a fork, and the on-disk pack size does not suffer all that much (because there is usually a good alternate delta base within your reachable history). So with that series, we can get good reuse for clones. But there are still two cases worth considering: 1. When you fetch a subset of the commits, git marks only the edges as preferred bases, and does not walk the full object graph down to the roots. So any object you want that is delta'd against something older will not get reused. If you have reachability bitmaps, I don't think there is any reason that we cannot use the entire object graph (starting at the have tips, of course) as preferred bases. 2. The server is not necessarily fully packed. In an active repo, you may have a large base pack with bitmaps, with several recently pushed packs on top. You still need to delta the recently pushed objects against the base objects. I don't have measurements on how much the deltas suffer in those two cases. I know they suffered quite badly for clones without the name hashes in our alternates repos, but that part should go away with my patch series. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] documentation: add documentation for the bitmap format
On Mon, Jun 24, 2013 at 5:23 PM, Vicent Marti tan...@gmail.com wrote: This is the technical documentation and design rationale for the new Bitmap v2 on-disk format. --- Documentation/technical/bitmap-format.txt | 235 + 1 file changed, 235 insertions(+) create mode 100644 Documentation/technical/bitmap-format.txt diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt new file mode 100644 index 000..5400082 --- /dev/null +++ b/Documentation/technical/bitmap-format.txt @@ -0,0 +1,235 @@ +GIT bitmap v2 format rationale + + + - A header appears at the beginning, using the same format + as JGit's original bitmap indexes. + + 4-byte signature: {'B', 'I', 'T', 'M'} + + 2-byte version number (network byte order) + The current implementation only supports version 2 + of the bitmap index. The rationale for this is explained + in this document. + + 2-byte flags (network byte order) + + The folowing flags are supported: + + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + This flag must always be present. It implies that the bitmap + index has been generated for a packfile with full closure + (i.e. where every single object in the packfile can find +its parent links inside the same packfile). This is a + requirement for the bitmap index format, also present in JGit, + that greatly reduces the complexity of the implementation. + + - BITMAP_OPT_LE_BITMAPS (0x2) + If present, this implies that that the EWAH bitmaps in this + index has been serialized to disk in little-endian byte order. + Note that this only applies to the actual bitmaps, not to the + Git data structures in the index, which are always in Network + Byte order as it's costumary. + + - BITMAP_OPT_BE_BITMAPS (0x4) + If present, this implies that the EWAH bitmaps have been serialized + using big-endian byte order (NWO). If the flag is missing, **the + default is to assume that the bitmaps are in big-endian**. I very much hate seeing a file format that is supposed to be portable that supports both big-endian and little-endian encoding. Such a specification forces everyone to implement two code paths to handle reading data from the file, on the off-chance they are on the wrong platform. Or it forces one platform to be unable to use the file. In which case a repository may want to build two files, one for each platform, but you blocked that off by allowing only one file. What is wrong with picking one encoding and sticking to it? The .idx file and the dircache use big-endian format. Why not just use big-endian here too and convert words on demand as they are accessed from the mmap region? That is what the .idx format does when accessing an offset. + - BITMAP_OPT_HASH_CACHE (0x8) + If present, a hash cache for finding delta bases will be available + right after the header block in this index. See the following + section for details. + + 4-byte entry count (network byte order) + + The total count of entries (bitmapped commits) in this bitmap index. + + 20-byte checksum + + The SHA1 checksum of the pack this bitmap index belongs to. + + - An OPTIONAL delta cache follows the header. Some may find the name delta cache confusing as it does not cache deltas of objects. May I suggest path hash cache as an alternative name? + The cache is formed by `n` 4-byte hashes in a row, where `n` is + the amount of objects in the indexed packfile. Note that this amount + is the **total number of objects** and is not related to the + number of commits that have been selected and indexed in the + bitmap index. + + The hashes are stored in Network Byte Order and they are the same + values generated by a normal revision walk during the `pack-objects` + phase. I find it interesting this is network byte order and not big-endian or little-endian based on the flag in the header. + The `n`nth hash in the cache is the name hash for the `n`th object + in the index for the indexed packfile. + + [RATIONALE]: + + The bitmap