A single page export will not work, for sure, but as for that I was thinking about moving data out of dynmirror to mintiply.
For example, if you don't want to download the complete file before you have a metalink, you could check at http://www.dynmirror.net/metalink/?url=http://example.com to see if dynmirror has any metalink information. You could use dynmirror as a kind of caching backend for downloads. Another thing I could do is have dynmirror redirect to mintiply if there is no hash information available, maybe that would be a good approach... I'm not really sure it would add anything, but technically it should be possible and I think it might be good to get some code commits on dynmirror anyway ;) Greets, Bram On Sun, Aug 19, 2012 at 9:58 AM, Jack Bates <[email protected]> wrote: > On Thursday, August 16, 2012 10:44:19 PM UTC-7, Jack Bates wrote: >> >> On Tuesday, August 14, 2012 1:58:22 PM UTC-7, Bram Neijt wrote: >>> >>> Hi Jack, >>> >>> I once created a similair thing, but it required the "owner" of the >>> file to host the MD5 he/she thinks it should be. It then generates a >>> metalink based on all the md5/sha1/sha256 hashes in the database. >>> >>> The idea is that anybody can step up and start a mirror by hosting the >>> files and the MD5SUMS and have the service spider the MD5SUMS file. >>> >>> You can find the service at: http://www.dynmirror.net/ >> >> >> Cool! The design of this site is impressive. I like how it shows >> analytics, like recent downloads, on the front page >> >>> It might be a good idea to join up the databases or do some >>> collaboration somewhere. Let's see what we can do. For instance, I >>> could add a mintiply url collection or something like that? Or maybe I >>> could have dynmirror register the hash/link combinations at mintiply? >> >> >> Great idea, thanks for suggesting it. The first thing that comes to mind >> is, how would you like to get data out of Mintiply (and into Dynmirror)? Is >> there an API that Mintiply could provide that would make this as easy as >> possible? > > > Hi Bram and thanks again for inviting me to collaborate, > > As an experiment, I just added a page to export all of the data from > Mintiply, in Metalink format. Let me know what you think. Could this be > useful to a project like Dynmirror? or would you prefer a different format, > or different data? > > There isn't much data in the app yet, so dumping everything in one Metalink > response works fine. If the amount of data ever gets large, we may need to > rethink this > > Here is the page: http://mintiply.appspot.com/export > >>> Let me know what you think. Currently, I think I'm the only user of >>> dynmirror.net (at http://www.logfish.net/pr/ccbuild/downloads/ ). >>> >>> I'd also be happy to dig up and publish the code somewhere if I havn't >>> already. >>> >>> Greets, >>> >>> Bram >> >> >> Thanks very much for inviting me to collaborate >> >>> On Tue, Aug 14, 2012 at 8:30 AM, Jack Bates <[email protected]> wrote: >>> > Hi, what do you think about a Google App Engine app that generates >>> > Metalinks >>> > for URLs? Maybe something like this already exists? >>> > >>> > The first time you visit, e.g. >>> > >>> > http://mintiply.appspot.com/http://apache.osuosl.org/trafficserver/trafficserver-3.2.0.tar.bz2 >>> > it downloads the content and computes a digest. App Engine has *lots* >>> > of >>> > bandwidth, so this is snappy. Then it sends a response with "Digest: >>> > SHA-256=..." and "Location: ..." headers, similar to MirrorBrain >>> > >>> > It also records the digest with Google's Datastore, so on subsequent >>> > visits, >>> > it doesn't download or recompute the digest >>> > >>> > Finally, it also checks the Datastore for other URLs with matching >>> > digest, >>> > and sends "Link: <...>; rel=duplicate" headers for each of these. So if >>> > you >>> > visit, e.g. >>> > >>> > http://mintiply.appspot.com/http://mirror.nexcess.net/apache/trafficserver/trafficserver-3.2.0.tar.bz2 >>> > it sends "Link: >>> > <http://apache.osuosl.org/trafficserver/trafficserver-3.2.0.tar.bz2>; >>> > rel=duplicate" >>> > >>> > The idea is that this could be useful for sites that don't yet generate >>> > Metalinks, like SourceForge. You could always prefix a URL that you >>> > pass to >>> > a Metalink client with "http://mintiply.appspot.com/" to get a >>> > Metalink. >>> > Alternatively, if a Metalink client noticed that it was downloading a >>> > large >>> > file without mirror or hash metadata, it could try to get more mirrors >>> > from >>> > this app, while it continued downloading the file. As long as someone >>> > else >>> > had previously tried the same URL, or App Engine can download the file >>> > faster than the client, then it should get more mirrors in time to help >>> > finish the download. Popular downloads should have the most complete >>> > list of >>> > mirrors, since these URLs should have been tried the most >>> > >>> > Right now it only downloads a URL once, and remembers the digest >>> > forever, >>> > which assumes that the content at the URL never changes. This is true >>> > for >>> > many downloads, but in future it could respect cache control headers >>> > >>> > Also right now it only generates HTTP Metalinks with a whole file >>> > digest. >>> > But in future it could conceivably generate XML Metalinks with partial >>> > digests >>> > >>> > A major limitation with this proof of concept is that I ran into some >>> > App >>> > Engine errors with downloads of any significant size, like Ubuntu ISOs. >>> > The >>> > App Engine maximum response size is 32 MB. The app overcomes this with >>> > byte >>> > ranges and downloading files in 32 MB segments. This works on my local >>> > machine with the App Engine dev server, but in production Google >>> > apparently >>> > kills the process after downloading just a few segments, because it >>> > uses too >>> > much memory. This seems wrong, since the app throws away each segment >>> > after >>> > adding it to the digest. So if it has enough memory to download one >>> > segment, >>> > it shouldn't require any more memory for additional segments. Maybe >>> > this >>> > could be worked around by manually calling the Python garbage >>> > collector, or >>> > by shrinking the segment size... >>> > >>> > Also I ran into a second bug with App Engine URL Fetch and downloads of >>> > any >>> > significant size: >>> > http://code.google.com/p/googleappengine/issues/detail?id=7732#c6 >>> > >>> > Another thought is whether any web crawlers already maintain a database >>> > of >>> > digests that an app like this could exploit? >>> > >>> > Here is the codes: >>> > https://github.com/jablko/mintiply/blob/master/mintiply.py >>> > >>> > What are your thoughts? Maybe something like this already exists, or >>> > was >>> > already tried in the past... >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> > Groups >>> > "Metalink Discussion" group. >>> > To view this discussion on the web visit >>> > https://groups.google.com/d/msg/metalink-discussion/-/r7cq8sL0LuMJ. >>> > To post to this group, send email to [email protected]. >>> > To unsubscribe from this group, send email to >>> > [email protected]. >>> > For more options, visit this group at >>> > http://groups.google.com/group/metalink-discussion?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Metalink Discussion" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/metalink-discussion/-/nQSS5zOJRrgJ. > > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/metalink-discussion?hl=en. -- You received this message because you are subscribed to the Google Groups "Metalink Discussion" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/metalink-discussion?hl=en.
