On Wednesday, August 15, 2012 10:07:58 AM UTC-7, Neil M. wrote:
>
> > 
> > Another thought is whether any web crawlers already maintain a database 
> of 
> > digests that an app like this could exploit? 
> > 
> > Here is the codes: 
> > https://github.com/jablko/mintiply/blob/master/mintiply.py 
> > 
> > What are your thoughts? Maybe something like this already exists, or was 
>
> > already tried in the past... 
>
> I've written a metalink crawler for .metalink files.  Its pretty dumb but 
> it gets the job done. The code is available here: 
>
> http://metalinks.svn.sourceforge.net/viewvc/metalinks/crawler/ 
>
> You can see the results here: 
>
> http://www.nabber.org/projects/metalink/crawler/list.php 
>
> I imagine it wouldn't be hard to modify to instead of grabbing the 
> .metalink files, parse them and dump them into your database.  One 
> advantage to this method is any URLs that are now dead are still captured 
> in the .metalink files, so your AppEngine code could detect and redirect a 
> "dumb" browser to a working download location instead. 
>

Interesting idea, and thanks for writing this Metalink crawler

As for a hash database, I've been researching options for my Appupdater 
> project.  There are some hash search type sites out there but I don't 
> think 
> they will be useful in this case since I haven't seen any that track URLs, 
> its usually just file size, version, product name, etc.  There seem to be 
> plenty of datasets out there for installers from the various download 
> websites, like sourceforge.net, softpedia, oldapps.com, etc.  However, 
> from 
> what I can tell there is no way to download a database from any of these, 
> you'd have parse the individual web pages.  While possible that doesn't 
> seem to be a very efficient way of doing things, you'd need to customize 
> it 
> for each website.  Actually probably the better and easier way is to build 
> a .exe, .msi, etc. crawler, download the file and compute your own hashes. 
> It will take a lot of time and bandwidth but you'd get a really good 
> dataset that way.  In other words have a crawler that feeds your AppEngine 
> code URLs to process. 
>

I agree. Thanks a lot for sharing your experience researching options for 
Appupdater

Neil 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metalink Discussion" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/metalink-discussion/-/zZPp5NxfB9EJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/metalink-discussion?hl=en.

Reply via email to