Hi everybody, I took the time to look up my code and found out I never published dynmirror.net.
The code is now online at https://github.com/bneijt/dynmirror.net I'll still have to publish correct licensing information etc, and find a good way to clean up having jinja2 in the git repo as well, but as I have a few other projects going on I don't think I'll get to that any time soon. If you have any questions regarding the code, feel free to mail me directly. Greets, Bram On Tue, Aug 21, 2012 at 7:45 AM, Jack Bates <[email protected]> wrote: > On Sunday, August 19, 2012 2:15:46 PM UTC-7, Bram Neijt wrote: >> >> A single page export will not work, for sure, but as for that I was >> thinking about moving data out of dynmirror to mintiply. >> >> For example, if you don't want to download the complete file before >> you have a metalink, you could check at >> http://www.dynmirror.net/metalink/?url=http://example.com >> to see if dynmirror has any metalink information. You could use >> dynmirror as a kind of caching backend for downloads. >> >> Another thing I could do is have dynmirror redirect to mintiply if >> there is no hash information available, maybe that would be a good >> approach... >> >> I'm not really sure it would add anything, but technically it should >> be possible and I think it might be good to get some code commits on >> dynmirror anyway ;) > > > That sounds like a good idea. Please let me know if there's anything I can > do to help with this > > Cheers > >> Greets, >> >> Bram >> >> >> On Sun, Aug 19, 2012 at 9:58 AM, Jack Bates <[email protected]> wrote: >> > On Thursday, August 16, 2012 10:44:19 PM UTC-7, Jack Bates wrote: >> >> >> >> On Tuesday, August 14, 2012 1:58:22 PM UTC-7, Bram Neijt wrote: >> >>> >> >>> Hi Jack, >> >>> >> >>> I once created a similair thing, but it required the "owner" of the >> >>> file to host the MD5 he/she thinks it should be. It then generates a >> >>> metalink based on all the md5/sha1/sha256 hashes in the database. >> >>> >> >>> The idea is that anybody can step up and start a mirror by hosting the >> >>> files and the MD5SUMS and have the service spider the MD5SUMS file. >> >>> >> >>> You can find the service at: http://www.dynmirror.net/ >> >> >> >> >> >> Cool! The design of this site is impressive. I like how it shows >> >> analytics, like recent downloads, on the front page >> >> >> >>> It might be a good idea to join up the databases or do some >> >>> collaboration somewhere. Let's see what we can do. For instance, I >> >>> could add a mintiply url collection or something like that? Or maybe I >> >>> could have dynmirror register the hash/link combinations at mintiply? >> >> >> >> >> >> Great idea, thanks for suggesting it. The first thing that comes to >> >> mind >> >> is, how would you like to get data out of Mintiply (and into >> >> Dynmirror)? Is >> >> there an API that Mintiply could provide that would make this as easy >> >> as >> >> possible? >> > >> > >> > Hi Bram and thanks again for inviting me to collaborate, >> > >> > As an experiment, I just added a page to export all of the data from >> > Mintiply, in Metalink format. Let me know what you think. Could this be >> > useful to a project like Dynmirror? or would you prefer a different >> > format, >> > or different data? >> > >> > There isn't much data in the app yet, so dumping everything in one >> > Metalink >> > response works fine. If the amount of data ever gets large, we may need >> > to >> > rethink this >> > >> > Here is the page: http://mintiply.appspot.com/export >> > >> >>> Let me know what you think. Currently, I think I'm the only user of >> >>> dynmirror.net (at http://www.logfish.net/pr/ccbuild/downloads/ ). >> >>> >> >>> I'd also be happy to dig up and publish the code somewhere if I havn't >> >>> already. >> >>> >> >>> Greets, >> >>> >> >>> Bram >> >> >> >> >> >> Thanks very much for inviting me to collaborate >> >> >> >>> On Tue, Aug 14, 2012 at 8:30 AM, Jack Bates <[email protected]> >> >>> wrote: >> >>> > Hi, what do you think about a Google App Engine app that generates >> >>> > Metalinks >> >>> > for URLs? Maybe something like this already exists? >> >>> > >> >>> > The first time you visit, e.g. >> >>> > >> >>> > >> >>> > http://mintiply.appspot.com/http://apache.osuosl.org/trafficserver/trafficserver-3.2.0.tar.bz2 >> >>> > it downloads the content and computes a digest. App Engine has >> >>> > *lots* >> >>> > of >> >>> > bandwidth, so this is snappy. Then it sends a response with "Digest: >> >>> > SHA-256=..." and "Location: ..." headers, similar to MirrorBrain >> >>> > >> >>> > It also records the digest with Google's Datastore, so on subsequent >> >>> > visits, >> >>> > it doesn't download or recompute the digest >> >>> > >> >>> > Finally, it also checks the Datastore for other URLs with matching >> >>> > digest, >> >>> > and sends "Link: <...>; rel=duplicate" headers for each of these. So >> >>> > if >> >>> > you >> >>> > visit, e.g. >> >>> > >> >>> > >> >>> > http://mintiply.appspot.com/http://mirror.nexcess.net/apache/trafficserver/trafficserver-3.2.0.tar.bz2 >> >>> > it sends "Link: >> >>> > >> >>> > <http://apache.osuosl.org/trafficserver/trafficserver-3.2.0.tar.bz2>; >> >>> > rel=duplicate" >> >>> > >> >>> > The idea is that this could be useful for sites that don't yet >> >>> > generate >> >>> > Metalinks, like SourceForge. You could always prefix a URL that you >> >>> > pass to >> >>> > a Metalink client with "http://mintiply.appspot.com/" to get a >> >>> > Metalink. >> >>> > Alternatively, if a Metalink client noticed that it was downloading >> >>> > a >> >>> > large >> >>> > file without mirror or hash metadata, it could try to get more >> >>> > mirrors >> >>> > from >> >>> > this app, while it continued downloading the file. As long as >> >>> > someone >> >>> > else >> >>> > had previously tried the same URL, or App Engine can download the >> >>> > file >> >>> > faster than the client, then it should get more mirrors in time to >> >>> > help >> >>> > finish the download. Popular downloads should have the most complete >> >>> > list of >> >>> > mirrors, since these URLs should have been tried the most >> >>> > >> >>> > Right now it only downloads a URL once, and remembers the digest >> >>> > forever, >> >>> > which assumes that the content at the URL never changes. This is >> >>> > true >> >>> > for >> >>> > many downloads, but in future it could respect cache control headers >> >>> > >> >>> > Also right now it only generates HTTP Metalinks with a whole file >> >>> > digest. >> >>> > But in future it could conceivably generate XML Metalinks with >> >>> > partial >> >>> > digests >> >>> > >> >>> > A major limitation with this proof of concept is that I ran into >> >>> > some >> >>> > App >> >>> > Engine errors with downloads of any significant size, like Ubuntu >> >>> > ISOs. >> >>> > The >> >>> > App Engine maximum response size is 32 MB. The app overcomes this >> >>> > with >> >>> > byte >> >>> > ranges and downloading files in 32 MB segments. This works on my >> >>> > local >> >>> > machine with the App Engine dev server, but in production Google >> >>> > apparently >> >>> > kills the process after downloading just a few segments, because it >> >>> > uses too >> >>> > much memory. This seems wrong, since the app throws away each >> >>> > segment >> >>> > after >> >>> > adding it to the digest. So if it has enough memory to download one >> >>> > segment, >> >>> > it shouldn't require any more memory for additional segments. Maybe >> >>> > this >> >>> > could be worked around by manually calling the Python garbage >> >>> > collector, or >> >>> > by shrinking the segment size... >> >>> > >> >>> > Also I ran into a second bug with App Engine URL Fetch and downloads >> >>> > of >> >>> > any >> >>> > significant size: >> >>> > http://code.google.com/p/googleappengine/issues/detail?id=7732#c6 >> >>> > >> >>> > Another thought is whether any web crawlers already maintain a >> >>> > database >> >>> > of >> >>> > digests that an app like this could exploit? >> >>> > >> >>> > Here is the codes: >> >>> > https://github.com/jablko/mintiply/blob/master/mintiply.py >> >>> > >> >>> > What are your thoughts? Maybe something like this already exists, or >> >>> > was >> >>> > already tried in the past... >> >>> > >> >>> > -- >> >>> > You received this message because you are subscribed to the Google >> >>> > Groups >> >>> > "Metalink Discussion" group. >> >>> > To view this discussion on the web visit >> >>> > https://groups.google.com/d/msg/metalink-discussion/-/r7cq8sL0LuMJ. >> >>> > To post to this group, send email to [email protected]. >> >>> > To unsubscribe from this group, send email to >> >>> > [email protected]. >> >>> > For more options, visit this group at >> >>> > http://groups.google.com/group/metalink-discussion?hl=en. >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "Metalink Discussion" group. >> > To view this discussion on the web visit >> > https://groups.google.com/d/msg/metalink-discussion/-/nQSS5zOJRrgJ. >> > >> > To post to this group, send email to [email protected]. >> > To unsubscribe from this group, send email to >> > [email protected]. >> > For more options, visit this group at >> > http://groups.google.com/group/metalink-discussion?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Metalink Discussion" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/metalink-discussion/-/zkL9SJJaRssJ. > > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/metalink-discussion?hl=en. -- You received this message because you are subscribed to the Google Groups "Metalink Discussion" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/metalink-discussion?hl=en.
