At http://people.ubuntu.com/~robertc/baz2.0/plugins/groupcompress/trunk
------------------------------------------------------------ revno: 30 revision-id: [email protected] parent: [email protected] parent: [email protected] committer: Robert Collins <[email protected]> branch nick: trunk timestamp: Tue 2009-03-03 07:55:44 +1100 message: Merge trunk modified: __init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6 groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13 ------------------------------------------------------------ revno: 28.1.33 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Mon 2009-03-02 14:33:13 -0600 message: Properly name the file XXX.autopack rather than XXXautopack modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.32 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Mon 2009-03-02 14:08:37 -0600 message: Fix bug #336373 by adding local keys to locations after the fact, rather than before. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.31 revision-id: [email protected] parent: [email protected] committer: Ian Clatworthy <[email protected]> branch nick: groupcompress timestamp: Mon 2009-03-02 17:11:30 +1000 message: add coment suggesting a simplification in repofmt.py modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.30 revision-id: [email protected] parent: [email protected] committer: Ian Clatworthy <[email protected]> branch nick: groupcompress timestamp: Mon 2009-03-02 16:57:05 +1000 message: repofmt.py code cleanups modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.29 revision-id: [email protected] parent: [email protected] committer: Ian Clatworthy <[email protected]> branch nick: groupcompress timestamp: Mon 2009-03-02 16:35:43 +1000 message: groupcompress.py code cleanups modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.28 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Fri 2009-02-27 13:18:06 -0600 message: Fix typo with the recent lines => chunks rename. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.27 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Thu 2009-02-26 23:18:39 -0600 message: Update a Note/Todo modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.26 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Thu 2009-02-26 23:15:20 -0600 message: Try even harder, now with even *more* streams. The compressed size drops by another 4x. Turn the data for each *layer* into a different stream. With this change, gc255 has compressed inventory drop to 1.5MB which is finally *smaller* than the source 'knit' format. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.25 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Thu 2009-02-26 23:09:31 -0600 message: As expected, splitting things up into streams of streams gives even better compression. (Down to 4.4MB for inventories). Probably the big win is that parent_id_basename content doesn't compress well at all versus id_to_entry content, and this way you don't get large offsets. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.24 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Thu 2009-02-26 21:57:33 -0600 message: Add a general progress indicator for other parts of copy. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.23 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Thu 2009-02-26 21:54:42 -0600 message: Add a progress indicator for chk pages. Fix a bug with handling signatures, which don't have a parent graph modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.22 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Thu 2009-02-26 21:34:45 -0600 message: Make it clear that the bits you get from 'apply_delta' are chunks, not lines. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.21 revision-id: [email protected] parent: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Thu 2009-02-26 21:28:10 -0600 message: Merge the chk sorting code. Restore labels and sha1s in the stored data. Leave the 'extra' formats commented out for now. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13 ------------------------------------------------------------ revno: 28.3.6 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Thu 2009-02-26 21:04:49 -0600 message: Clustering chk pages properly makes a big difference. By iterating root nodes in the same order as the referencing inventory, and then iterating by search prefix, we get compression about 2:1 versus not compressing at all, which is probably 50% better than random ordering. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.3.5 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Thu 2009-02-26 16:41:52 -0600 message: Try a different method of streaming the chk pages. In this method, we work out what chk pages are referenced by what inventory pages. And then fetch them based on breadth-first references. This should mean that pages that will compress well together are sent together, rather than in arbitrary ordering. Note that we might want to do even a little better, and use a list for the first time we encounter it, rather than sets everywhere. (we still want a set to make sure we don't add it multiple times to the list) Then again, 'unordered' may reorder it anyway, so it may not matter. We should also consider using multiple chk streams, because it will likely result in better compression, by forcing breaks in the gc groups. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.3.4 revision-id: [email protected] parent: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Thu 2009-02-26 16:09:34 -0600 message: Bring in the missing update from 'trunk' modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.3.3 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Thu 2009-02-26 15:59:37 -0600 message: Play with some experimental alternate hashes, comment them out for now. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.3.2 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Thu 2009-02-26 15:57:57 -0600 message: experiment with removing the label and sha1 fields. Seems to shrink texts by 10-30%. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.20 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Wed 2009-02-25 17:04:22 -0600 message: Setting _fetch_order='topological' gives sub-optimal ordering for gc=>gc fetches. This is because the 'autopack' code will convert to 'gc-optimal', which means that 'unordered' will then continue the 'gc-optimal' route. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.19 revision-id: [email protected] parent: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Wed 2009-02-25 16:59:58 -0600 message: Groupcompress now supports 'autopack' and 'pack'. It does this by just creating a new pack file, wrapping a GCVersionedFiles around it, and streaming in the data in 'gc-optimal' ordering. This actually seems to work fairly well. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.3.1 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Wed 2009-02-25 16:14:29 -0600 message: A first-cut at implementing an auto-pack by copying everything. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.18 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Wed 2009-02-25 16:21:23 -0600 message: Implement new handling of get_bytes_as(), and get_missing_compression_parent_keys() Now works on bzr.dev's new streaming code. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.17 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Wed 2009-02-25 16:11:02 -0600 message: Fix the test suite now that we don't match short lines modified: tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13 ------------------------------------------------------------ revno: 28.1.16 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Fri 2009-02-20 09:08:31 -0600 message: Adding a 'soft' flag, to make the minimum match 200 bytes comp time is 9m46s, comp size is improved across the board 11.3MB. So max group 8MB, max inter-file-id 4MB, 'soft' matching with a new file_id gives good compression at equivalent speed. ------------------------------------------------------------ revno: 28.1.15 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Thu 2009-02-19 21:52:05 -0600 message: Change so that regions that have lots of copies get converted back into an insertion. This does get triggered, but it doesn't help. The total compression is 17MB, and the conversion time is 10min. Which is equivalent to the original values. Even further, don't match blocks that are shorter than XX bytes (currently 10). With a value of 5, we still get trivial blocks inserted. With a value of 10, everything changes to copies. Dropping the max block size to 8MB decreases the total bytes to 14MB (presumably because the copy records now have 1 fewer byte per record). It also makes it 9m versus 10m. Preferentially splitting based on file-id (at >= 4MB) stays at 9min, but drops it to 13MB modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.14 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Thu 2009-02-19 15:08:03 -0600 message: Factor out _get_group_and_delta_lines. The previous change (to ignore empty texts, and start new compressors) dropped the conversion time to 11m43s at a modest expansion to 13.4MB. The time difference is surprising, we should check if it is the no-newlines or the new-compressors. (my guess is the latter). modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.13 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Thu 2009-02-19 14:55:17 -0600 message: Play around a bit. 1) Empty texts are no-op inserted, to avoid ever trying to match against their text. 2) If we find a new file-id and the compressor is more than half full, we go ahead and start a new compressor. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.12 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Thu 2009-02-19 14:48:34 -0600 message: Change the code a little bit. If a given text has not been seen before, insert all lines for that text. At present, we are doing *worse* than knit compression, because we have so many matching groups from various locations. Which causes us to just have huge swaths of copies. By inserting the full lines, we get more regions that we are able to generate a larger match against. This slows down the processing (10m => 24m), but improves compression (16MB => 12MB). modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.11 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: experimental timestamp: Thu 2009-02-19 14:45:00 -0600 message: start experimenting with gc-optimal ordering. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.10 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Thu 2009-02-19 12:24:42 -0600 message: Change the extraction ordering for 'unordered'. Instead of using a random ordering, use the ordering defined by the index memos. This should give us the best group-locality. This gives a rather large performance improvement. Like 30s versus 10min. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.9 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Wed 2009-02-18 16:14:55 -0600 message: Revert previous change. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.8 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Wed 2009-02-18 16:14:22 -0600 message: Allow writing negative offsets. Turns out not to actually compress better. After zlib compression, negative offsets are a loss. Presumably because there is redundancy that zlib can factor out from bytes-since-start. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.7 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Wed 2009-02-18 14:40:46 -0600 message: (ugly hack) autopacking doesn't work, so don't do it. Force the fetch order and delta logic to use fulltexts in topological order. It isn't great, but it means things work. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.6 revision-id: [email protected] parent: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Wed 2009-02-18 14:39:05 -0600 message: Merge in the dev5 formats. modified: __init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6 repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.2.3 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: dev5 timestamp: Tue 2009-02-17 13:35:38 -0600 message: Start putting together a GroupCompress format that is built on dev5 modified: __init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6 repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.2.2 revision-id: [email protected] parent: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: dev5 timestamp: Fri 2009-02-13 16:06:03 -0600 message: Bring in the trunk simplifications. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.2.1 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: dev5 timestamp: Fri 2009-02-13 15:57:21 -0600 message: Start basing the groupcompress chk formats on the dev5 formats. modified: __init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6 repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.5 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Tue 2009-02-17 16:17:24 -0600 message: Finish the Fulltext => Chunked conversions so that we work in the more-efficient Chunks. modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.4 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Fri 2009-02-13 16:04:13 -0600 message: Simplify the internals. We've already checked 'chk_support' so we don't need to check again. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.3 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Fri 2009-02-13 15:55:48 -0600 message: Properly add GCPlainCHK to the pack_incompatible list. modified: repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1 ------------------------------------------------------------ revno: 28.1.2 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Fri 2009-02-13 15:52:00 -0600 message: Teach groupcompress about 'chunked' encoding modified: groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8 ------------------------------------------------------------ revno: 28.1.1 revision-id: [email protected] parent: [email protected] committer: John Arbash Meinel <[email protected]> branch nick: trunk timestamp: Fri 2009-02-13 15:32:46 -0600 message: Import repo_registry earlier. modified: __init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6 Diff too large for email (1226 lines, the limit is 1000). -- bazaar-commits mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/bazaar-commits
