At http://bzr.arbash-meinel.com/branches/bzr/brisbane/hack

------------------------------------------------------------
revno: 3832
revision-id: [email protected]
parent: [email protected]
committer: John Arbash Meinel <[email protected]>
branch nick: hack
timestamp: Thu 2008-12-25 09:15:32 -0600
message:
  Change the _check_remap code to request unread pages in batches.
  
  The theory is that if you can probably find out that you won't fit even 
without
  reading all of the child nodes.
  Also, increase the page cache size slightly, so that the size should be able 
to
  hold a single working set.
  
  Initial results seem promising, with less time spent in _check_remap's
  get_record_stream calls.
  
  However, it may have just pushed the index lookups into the 'has_key()' code
  before we add the new lines. Part of the problem is that 'sha' keys are going
  to be randomly distributed, so we will have to page in a lot of the keys.
  
  Further, the chk index is costing a large amount of disk space. At present,
  they are 26MB, versus 9MB for the .tix.
  It makes the total index size 10% of the total repository size, which is 
something
  to be concerned about.
=== modified file 'bzrlib/chk_map.py'
--- a/bzrlib/chk_map.py 2008-12-24 20:23:12 +0000
+++ b/bzrlib/chk_map.py 2008-12-25 15:15:32 +0000
@@ -51,7 +51,9 @@
     )
 
 # approx 2MB
-_PAGE_CACHE_SIZE = 2*1024*1024
+# If each line is 50 bytes, and you have 255 internal pages, with 255-way fan
+# out, it takes 3.1MB to cache the layer.
+_PAGE_CACHE_SIZE = 4*1024*1024
 # We are caching bytes so len(value) is perfectly accurate
 _page_cache = lru_cache.LRUSizeCache(_PAGE_CACHE_SIZE)
 
@@ -1224,28 +1226,32 @@
             #       The other argument is that unmap() is uncommon, so we don't
             #       need to optimize it. But map() with a slightly shorter
             #       value may happen a lot.
-            stream = store.get_record_stream(keys, 'unordered', True)
-            nodes = []
-            # Fully consume the stream, even if we could determine that we
-            # don't need to continue. We requested the bytes, we may as well
-            # use them
-            for record in stream:
-                bytes = record.get_bytes_as('fulltext')
-                node = _deserialise(bytes, record.key)
-                self._items[keys[record.key]] = node
-                _page_cache[record.key] = bytes
-                nodes.append(node)
-            for node in nodes:
-                if isinstance(node, InternalNode):
-                    # We know we won't fit
-                    def stream_is_internal(): pass
-                    stream_is_internal()
-                    return self
-                for key, value in node._items.iteritems():
-                    if new_leaf._map_no_split(key, value):
-                        def stream_causes_split(): pass
-                        stream_causes_split()
+            batch_size = 10
+            key_order = list(keys)
+            for batch_start in range(0, len(key_order), batch_size):
+                batch = key_order[batch_start:batch_start + batch_size]
+                stream = store.get_record_stream(batch, 'unordered', True)
+                nodes = []
+                # Fully consume the stream, even if we could determine that we
+                # don't need to continue. We requested the bytes, we may as 
well
+                # use them
+                for record in stream:
+                    bytes = record.get_bytes_as('fulltext')
+                    node = _deserialise(bytes, record.key)
+                    self._items[keys[record.key]] = node
+                    _page_cache[record.key] = bytes
+                    nodes.append(node)
+                for node in nodes:
+                    if isinstance(node, InternalNode):
+                        # We know we won't fit
+                        def stream_is_internal(): pass
+                        stream_is_internal()
                         return self
+                    for key, value in node._items.iteritems():
+                        if new_leaf._map_no_split(key, value):
+                            def stream_causes_split(): pass
+                            stream_causes_split()
+                            return self
 
         # We have gone to every child, and everything fits in a single leaf
         # node, we no longer need this internal node

-- 
bazaar-commits mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/bazaar-commits

Reply via email to