Hi,

I made a patch to zc.buildout.download which introduces a very simple "network cache" or "cache into network". So, I tried to keep this as simple and generic as buildout is. Basically, with the patch the download of a file, follow this order:

1. Try Local Cache (no change from original)
2. Try network cache (I explain bellow)
3. Try original URL
4. Post file data to network cache (pure HTTP)

So, the network cache is just one URL where files are placed (can be any simple HTTP) and identified by file MD5 like this:

 GET http://my.company.shared.cache/md5_provided_for_the_file

The cache update is done by a simple post to same adress:

 POST http://my.company.shared.cache/md5_provided_for_the_file < data

As I'm familiar with ERP5, I implemented a very simple way to handle this cache, but I can also contribute with a simpler solution like eggproxy do for eggs. If you think my patch is useful and ok.

As I don't have access to svn to make a branch, I'm attaching the patch.

With network cache, I think people can share downloads into private networks or prevent your build is break when some source is unavailable.

If you consider this behaviour inapropriate for the core of buildout, but appropriated to be an buildout extension, let me know.

Regards,
Rafael Monnerat

On 11-01-2011 06:03, Thomas Lotze wrote:
rafael wrote:

We are planning to extend buildout download API to provide a distributed
automatic
packaging and caching system so that even if original source web site is
down, the buildout process can stil run. This could also be useful to
build software in
secured networks.
I'd strongly suggest keeping this logic out of the download API. It sounds
like something that may potentially grow a lot more complex than a simple
"download this URL, with or without using a cache" gesture.

In my opinion, a distributed packaging system is application logic from
the perspective of a generic framework such as zc.buildout. It might be
implemented by a recipe, some library on top of the download API or some
other mechanism altogether, but it should neither complicate the semantics
of the existing download API nor add a new one to the zc.buildout code
base.


Index: src/zc/buildout/download.py
===================================================================
--- src/zc/buildout/download.py	(revisão 119589)
+++ src/zc/buildout/download.py	(cópia de trabalho)
@@ -61,6 +61,7 @@
         self.cache = cache
         if cache == -1:
             self.cache = options.get('download-cache')
+        self.network_cache = options.get('network-cache', None)
         self.namespace = namespace
         self.offline = offline
         if offline == -1:
@@ -139,7 +140,43 @@
             _, is_temp = self.download(url, md5sum, cached_path)
 
         return cached_path, is_temp
+   
+    def download_network_cached(self, path, md5sum):
+        """Download from a network cache provider
 
+        If something fail (providor be offline, or md5sum fail), we ignore
+        network cached files.
+        
+        return True if download succeeded.
+        """
+        url = os.path.join(self.network_cache, md5sum)
+        self.logger.info('Downloading from network cache %s' % url)
+        try:
+            path, headers = urllib.urlretrieve(url, path)
+            if not check_md5sum(path, md5sum):
+                self.logger.info('MD5 checksum mismatch downloading %r' % url)
+                return False
+        except IOError, e:
+            self.logger.info('Fail to download from network cache %s' % url)
+            return False
+
+        return True
+
+    def upload_network_cached(self, path, md5sum):
+        """Upload file to a network cache server"""
+        try:
+            f = open(path, 'r')
+            data = f.read()
+            url = os.path.join(self.network_cache, md5sum)
+            try:
+                result = urllib.urlopen(url, urllib.urlencode({
+                    "data" : data}))
+            except (IOError,EOFError), e:
+                self.logger.info('Fail to upload cache on %s' % url)
+        finally:
+            f.close()
+        return True
+
     def download(self, url, md5sum=None, path=None):
         """Download a file from a URL to a given or temporary path.
 
@@ -173,10 +210,16 @@
         handle, tmp_path = tempfile.mkstemp(prefix='buildout-')
         try:
             try:
-                tmp_path, headers = urllib.urlretrieve(url, tmp_path)
-                if not check_md5sum(tmp_path, md5sum):
-                    raise ChecksumError(
-                        'MD5 checksum mismatch downloading %r' % url)
+                is_downloaded = False
+                if None not in [md5sum , self.network_cache]:
+                    is_downloaded = self.download_network_cached(tmp_path, md5sum)
+
+                if not is_downloaded:
+                    tmp_path, headers = urllib.urlretrieve(url, tmp_path)
+                    if not check_md5sum(tmp_path, md5sum):
+                        raise ChecksumError(
+                            'MD5 checksum mismatch downloading %r' % url)
+                    self.upload_network_cached(tmp_path, md5sum)
             finally:
                 os.close(handle)
         except:
_______________________________________________
Distutils-SIG maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to