Hi Jack, I once created a similair thing, but it required the "owner" of the file to host the MD5 he/she thinks it should be. It then generates a metalink based on all the md5/sha1/sha256 hashes in the database.
The idea is that anybody can step up and start a mirror by hosting the files and the MD5SUMS and have the service spider the MD5SUMS file. You can find the service at: http://www.dynmirror.net/ It might be a good idea to join up the databases or do some collaboration somewhere. Let's see what we can do. For instance, I could add a mintiply url collection or something like that? Or maybe I could have dynmirror register the hash/link combinations at mintiply? Let me know what you think. Currently, I think I'm the only user of dynmirror.net (at http://www.logfish.net/pr/ccbuild/downloads/ ). I'd also be happy to dig up and publish the code somewhere if I havn't already. Greets, Bram On Tue, Aug 14, 2012 at 8:30 AM, Jack Bates <[email protected]> wrote: > Hi, what do you think about a Google App Engine app that generates Metalinks > for URLs? Maybe something like this already exists? > > The first time you visit, e.g. > http://mintiply.appspot.com/http://apache.osuosl.org/trafficserver/trafficserver-3.2.0.tar.bz2 > it downloads the content and computes a digest. App Engine has *lots* of > bandwidth, so this is snappy. Then it sends a response with "Digest: > SHA-256=..." and "Location: ..." headers, similar to MirrorBrain > > It also records the digest with Google's Datastore, so on subsequent visits, > it doesn't download or recompute the digest > > Finally, it also checks the Datastore for other URLs with matching digest, > and sends "Link: <...>; rel=duplicate" headers for each of these. So if you > visit, e.g. > http://mintiply.appspot.com/http://mirror.nexcess.net/apache/trafficserver/trafficserver-3.2.0.tar.bz2 > it sends "Link: > <http://apache.osuosl.org/trafficserver/trafficserver-3.2.0.tar.bz2>; > rel=duplicate" > > The idea is that this could be useful for sites that don't yet generate > Metalinks, like SourceForge. You could always prefix a URL that you pass to > a Metalink client with "http://mintiply.appspot.com/" to get a Metalink. > Alternatively, if a Metalink client noticed that it was downloading a large > file without mirror or hash metadata, it could try to get more mirrors from > this app, while it continued downloading the file. As long as someone else > had previously tried the same URL, or App Engine can download the file > faster than the client, then it should get more mirrors in time to help > finish the download. Popular downloads should have the most complete list of > mirrors, since these URLs should have been tried the most > > Right now it only downloads a URL once, and remembers the digest forever, > which assumes that the content at the URL never changes. This is true for > many downloads, but in future it could respect cache control headers > > Also right now it only generates HTTP Metalinks with a whole file digest. > But in future it could conceivably generate XML Metalinks with partial > digests > > A major limitation with this proof of concept is that I ran into some App > Engine errors with downloads of any significant size, like Ubuntu ISOs. The > App Engine maximum response size is 32 MB. The app overcomes this with byte > ranges and downloading files in 32 MB segments. This works on my local > machine with the App Engine dev server, but in production Google apparently > kills the process after downloading just a few segments, because it uses too > much memory. This seems wrong, since the app throws away each segment after > adding it to the digest. So if it has enough memory to download one segment, > it shouldn't require any more memory for additional segments. Maybe this > could be worked around by manually calling the Python garbage collector, or > by shrinking the segment size... > > Also I ran into a second bug with App Engine URL Fetch and downloads of any > significant size: > http://code.google.com/p/googleappengine/issues/detail?id=7732#c6 > > Another thought is whether any web crawlers already maintain a database of > digests that an app like this could exploit? > > Here is the codes: > https://github.com/jablko/mintiply/blob/master/mintiply.py > > What are your thoughts? Maybe something like this already exists, or was > already tried in the past... > > -- > You received this message because you are subscribed to the Google Groups > "Metalink Discussion" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/metalink-discussion/-/r7cq8sL0LuMJ. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/metalink-discussion?hl=en. -- You received this message because you are subscribed to the Google Groups "Metalink Discussion" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/metalink-discussion?hl=en.
