trunk

Robert Collins Mon, 02 Mar 2009 12:58:55 -0800

At http://people.ubuntu.com/~robertc/baz2.0/plugins/groupcompress/trunk


------------------------------------------------------------
revno: 30
revision-id: [email protected]
parent: [email protected]
parent: [email protected]
committer: Robert Collins <[email protected]>
branch nick: trunk
timestamp: Tue 2009-03-03 07:55:44 +1100
message:
  Merge trunk
modified:
  __init__.py                    __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
  groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
  repofmt.py                     repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
  tests/test_groupcompress.py    
test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
    ------------------------------------------------------------
    revno: 28.1.33
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Mon 2009-03-02 14:33:13 -0600
    message:
      Properly name the file XXX.autopack rather than XXXautopack
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.32
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Mon 2009-03-02 14:08:37 -0600
    message:
      Fix bug #336373 by adding local keys to locations after the fact, rather 
than before.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.31
    revision-id: [email protected]
    parent: [email protected]
    committer: Ian Clatworthy <[email protected]>
    branch nick: groupcompress
    timestamp: Mon 2009-03-02 17:11:30 +1000
    message:
      add coment suggesting a simplification in repofmt.py
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.30
    revision-id: [email protected]
    parent: [email protected]
    committer: Ian Clatworthy <[email protected]>
    branch nick: groupcompress
    timestamp: Mon 2009-03-02 16:57:05 +1000
    message:
      repofmt.py code cleanups
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.29
    revision-id: [email protected]
    parent: [email protected]
    committer: Ian Clatworthy <[email protected]>
    branch nick: groupcompress
    timestamp: Mon 2009-03-02 16:35:43 +1000
    message:
      groupcompress.py code cleanups
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.28
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Fri 2009-02-27 13:18:06 -0600
    message:
      Fix typo with the recent lines => chunks rename.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.27
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Thu 2009-02-26 23:18:39 -0600
    message:
      Update a Note/Todo
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.26
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Thu 2009-02-26 23:15:20 -0600
    message:
      Try even harder, now with even *more* streams.
      The compressed size drops by another 4x.
      Turn the data for each *layer* into a different stream.
      With this change, gc255 has compressed inventory drop to 1.5MB
      which is finally *smaller* than the source 'knit' format.
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.25
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Thu 2009-02-26 23:09:31 -0600
    message:
      As expected, splitting things up into streams of streams
      gives even better compression. (Down to 4.4MB for inventories).
      Probably the big win is that parent_id_basename content doesn't compress
      well at all versus id_to_entry content, and this way you don't
      get large offsets.
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.24
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Thu 2009-02-26 21:57:33 -0600
    message:
      Add a general progress indicator for other parts of copy.
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.23
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Thu 2009-02-26 21:54:42 -0600
    message:
      Add a progress indicator for chk pages.
      Fix a bug with handling signatures, which don't have a parent graph
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.22
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Thu 2009-02-26 21:34:45 -0600
    message:
      Make it clear that the bits you get from 'apply_delta' are chunks, not 
lines.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.21
    revision-id: [email protected]
    parent: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Thu 2009-02-26 21:28:10 -0600
    message:
      Merge the chk sorting code.
      Restore labels and sha1s in the stored data.
      Leave the 'extra' formats commented out for now.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
      tests/test_groupcompress.py    
test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
        ------------------------------------------------------------
        revno: 28.3.6
        revision-id: [email protected]
        parent: [email protected]
        committer: John Arbash Meinel <[email protected]>
        branch nick: experimental
        timestamp: Thu 2009-02-26 21:04:49 -0600
        message:
          Clustering chk pages properly makes a big difference.
          
          By iterating root nodes in the same order as the referencing 
inventory,
          and then iterating by search prefix, we get compression about 2:1 
versus
          not compressing at all, which is probably 50% better than random 
ordering.
        modified:
          groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
          repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
        ------------------------------------------------------------
        revno: 28.3.5
        revision-id: [email protected]
        parent: [email protected]
        committer: John Arbash Meinel <[email protected]>
        branch nick: experimental
        timestamp: Thu 2009-02-26 16:41:52 -0600
        message:
          Try a different method of streaming the chk pages.
          In this method, we work out what chk pages are referenced by what 
inventory
          pages. And then fetch them based on breadth-first references.
          This should mean that pages that will compress well together are
          sent together, rather than in arbitrary ordering.
          Note that we might want to do even a little better, and use
          a list for the first time we encounter it, rather than sets 
everywhere.
          (we still want a set to make sure we don't add it multiple times to 
the list)
          
          Then again, 'unordered' may reorder it anyway, so it may not matter.
          We should also consider using multiple chk streams, because it
          will likely result in better compression, by forcing breaks in the
          gc groups.
        modified:
          repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
        ------------------------------------------------------------
        revno: 28.3.4
        revision-id: [email protected]
        parent: [email protected]
        parent: [email protected]
        committer: John Arbash Meinel <[email protected]>
        branch nick: experimental
        timestamp: Thu 2009-02-26 16:09:34 -0600
        message:
          Bring in the missing update from 'trunk'
        modified:
          groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
          repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
        ------------------------------------------------------------
        revno: 28.3.3
        revision-id: [email protected]
        parent: [email protected]
        committer: John Arbash Meinel <[email protected]>
        branch nick: experimental
        timestamp: Thu 2009-02-26 15:59:37 -0600
        message:
          Play with some experimental alternate hashes, comment them out for 
now.
        modified:
          repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
        ------------------------------------------------------------
        revno: 28.3.2
        revision-id: [email protected]
        parent: [email protected]
        committer: John Arbash Meinel <[email protected]>
        branch nick: experimental
        timestamp: Thu 2009-02-26 15:57:57 -0600
        message:
          experiment with removing the label and sha1 fields. Seems to shrink 
texts by 10-30%.
        modified:
          groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.20
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Wed 2009-02-25 17:04:22 -0600
    message:
      Setting _fetch_order='topological' gives sub-optimal ordering for gc=>gc 
fetches.
      This is because the 'autopack' code will convert to 'gc-optimal',
      which means that 'unordered' will then continue the 'gc-optimal' route.
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.19
    revision-id: [email protected]
    parent: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Wed 2009-02-25 16:59:58 -0600
    message:
      Groupcompress now supports 'autopack' and 'pack'.
      
      It does this by just creating a new pack file, wrapping a GCVersionedFiles
      around it, and streaming in the data in 'gc-optimal' ordering.
      This actually seems to work fairly well.
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
        ------------------------------------------------------------
        revno: 28.3.1
        revision-id: [email protected]
        parent: [email protected]
        committer: John Arbash Meinel <[email protected]>
        branch nick: experimental
        timestamp: Wed 2009-02-25 16:14:29 -0600
        message:
          A first-cut at implementing an auto-pack by copying everything.
        modified:
          repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.18
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Wed 2009-02-25 16:21:23 -0600
    message:
      Implement new handling of get_bytes_as(), and 
get_missing_compression_parent_keys()
      Now works on bzr.dev's new streaming code.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.17
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Wed 2009-02-25 16:11:02 -0600
    message:
      Fix the test suite now that we don't match short lines
    modified:
      tests/test_groupcompress.py    
test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
    ------------------------------------------------------------
    revno: 28.1.16
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: experimental
    timestamp: Fri 2009-02-20 09:08:31 -0600
    message:
      Adding a 'soft' flag, to make the minimum match 200 bytes
      comp time is 9m46s, comp size is improved across the board 11.3MB.
      
      So max group 8MB, max inter-file-id 4MB, 'soft' matching with a new
      file_id gives good compression at equivalent speed.
    ------------------------------------------------------------
    revno: 28.1.15
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: experimental
    timestamp: Thu 2009-02-19 21:52:05 -0600
    message:
      Change so that regions that have lots of copies get converted back
      into an insertion.
      This does get triggered, but it doesn't help. The total compression is 
17MB,
      and the conversion time is 10min. Which is equivalent to the original 
values.
      
      
      Even further, don't match blocks that are shorter than XX bytes 
(currently 10).
      With a value of 5, we still get trivial blocks inserted. With a value of 
10,
      everything changes to copies.
      
      Dropping the max block size to 8MB decreases the total bytes to 14MB 
(presumably
      because the copy records now have 1 fewer byte per record). It also makes 
it 9m versus 10m.
      Preferentially splitting based on file-id (at >= 4MB) stays at 9min, but
      drops it to 13MB
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.14
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: experimental
    timestamp: Thu 2009-02-19 15:08:03 -0600
    message:
      Factor out _get_group_and_delta_lines.
      
      The previous change (to ignore empty texts, and start new compressors)
      dropped the conversion time to 11m43s at a modest expansion to 13.4MB.
      The time difference is surprising, we should check if it is the 
no-newlines
      or the new-compressors. (my guess is the latter).
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.13
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: experimental
    timestamp: Thu 2009-02-19 14:55:17 -0600
    message:
      Play around a bit.
      
      1) Empty texts are no-op inserted, to avoid ever trying to match against 
their text.
      2) If we find a new file-id and the compressor is more than half full, we 
go
      ahead and start a new compressor.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.12
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: experimental
    timestamp: Thu 2009-02-19 14:48:34 -0600
    message:
      Change the code a little bit.
      
      If a given text has not been seen before, insert all lines for that text.
      At present, we are doing *worse* than knit compression, because we have
      so many matching groups from various locations. Which causes us to
      just have huge swaths of copies.
      
      By inserting the full lines, we get more regions that we are able to
      generate a larger match against.
      
      This slows down the processing (10m => 24m), but improves compression
      (16MB => 12MB).
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.11
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: experimental
    timestamp: Thu 2009-02-19 14:45:00 -0600
    message:
      start experimenting with gc-optimal ordering.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.10
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Thu 2009-02-19 12:24:42 -0600
    message:
      Change the extraction ordering for 'unordered'.
      
      Instead of using a random ordering, use the ordering defined by
      the index memos. This should give us the best group-locality.
      
      This gives a rather large performance improvement. Like 30s versus 10min.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.9
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Wed 2009-02-18 16:14:55 -0600
    message:
      Revert previous change.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.8
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Wed 2009-02-18 16:14:22 -0600
    message:
      Allow writing negative offsets. Turns out not to actually compress better.
      After zlib compression, negative offsets are a loss. Presumably because 
there is
      redundancy that zlib can factor out from bytes-since-start.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.7
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Wed 2009-02-18 14:40:46 -0600
    message:
      (ugly hack) autopacking doesn't work, so don't do it.
      Force the fetch order and delta logic to use fulltexts in topological 
order.
      It isn't great, but it means things work.
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.6
    revision-id: [email protected]
    parent: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Wed 2009-02-18 14:39:05 -0600
    message:
      Merge in the dev5 formats.
    modified:
      __init__.py                    
__init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
        ------------------------------------------------------------
        revno: 28.2.3
        revision-id: [email protected]
        parent: [email protected]
        committer: John Arbash Meinel <[email protected]>
        branch nick: dev5
        timestamp: Tue 2009-02-17 13:35:38 -0600
        message:
          Start putting together a GroupCompress format that is built on dev5
        modified:
          __init__.py                    
__init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
          repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
        ------------------------------------------------------------
        revno: 28.2.2
        revision-id: [email protected]
        parent: [email protected]
        parent: [email protected]
        committer: John Arbash Meinel <[email protected]>
        branch nick: dev5
        timestamp: Fri 2009-02-13 16:06:03 -0600
        message:
          Bring in the trunk simplifications.
        modified:
          repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
        ------------------------------------------------------------
        revno: 28.2.1
        revision-id: [email protected]
        parent: [email protected]
        committer: John Arbash Meinel <[email protected]>
        branch nick: dev5
        timestamp: Fri 2009-02-13 15:57:21 -0600
        message:
          Start basing the groupcompress chk formats on the dev5 formats.
        modified:
          __init__.py                    
__init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
          repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.5
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Tue 2009-02-17 16:17:24 -0600
    message:
      Finish the Fulltext => Chunked conversions so that we work in the 
more-efficient Chunks.
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.4
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Fri 2009-02-13 16:04:13 -0600
    message:
      Simplify the internals. We've already checked 'chk_support' so we don't 
need to check again.
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.3
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Fri 2009-02-13 15:55:48 -0600
    message:
      Properly add GCPlainCHK to the pack_incompatible list.
    modified:
      repofmt.py                     
repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.1.2
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Fri 2009-02-13 15:52:00 -0600
    message:
      Teach groupcompress about 'chunked' encoding
    modified:
      groupcompress.py               
groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.1.1
    revision-id: [email protected]
    parent: [email protected]
    committer: John Arbash Meinel <[email protected]>
    branch nick: trunk
    timestamp: Fri 2009-02-13 15:32:46 -0600
    message:
      Import repo_registry earlier.
    modified:
      __init__.py                    
__init__.py-20080705181503-ccbxd6xuy1bdnrpu-6

Diff too large for email (1226 lines, the limit is 1000).

-- 
bazaar-commits mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/bazaar-commits

Rev 30: Merge trunk in http://people.ubuntu.com/~robertc/baz2.0/plugins/groupcompress/trunk

Reply via email to