On 2/8/16 11:44 AM, Wyatt wrote:
On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu
wrote:
Dpaste currently does not expire pastes by default. I was thinking
it would be nice if it saved them in the Wayback Machine such that
they are archived redundantly.

I'm not sure what's the way to do it - probably linking the
newly-generated paste URLs from a page that the Wayback Machine
already knows of.

I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when
 the WM does not see a link that is search for, it offers the
option to archive it) obtaining
https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.




Thoughts?

You want it in Wayback?  Sounds like you need some WARC [0]. Since
anyone can upload to IA (using a nice S3-like API, even [1]), this
should be pretty uncomplicated.  If you can get a list of all the
paste URLs, you can use wget [2] to build the WARC fairly trivially.
[3]  Then I'd suggest getting a dlang account and make an item [4]
out of it. Just make sure it's set to mediatype:web and it should get
ingested by Wayback.

After that?  Generate a WARC when a paste is made and use the dlang
S3 keys to add it to the previous item (or maybe just do it daily or
weekly so as to not stress the derive queue too much). I'm pretty
sure that's all that's needed.

That's intense. I think a simple page (or chained linked collection of
pages) containing links to all pastes defined would suffice. For example
consider defining dpaste.dzfl.pl containing a link to
dpaste.dzfl.pl/today.html. That would contain e.g. the links generated
today and a button "More" linked to dpaste.dzfl.pl/2016-02-08.html
(which would be yesterday). That in turn would contain links to
yesterday's pastes and a link to the day before etc.

My understanding is this is enough to have wayback archive all pastes.

I'm pretty sure that's Andrei's thought, too. It's a pastebin; people
use it to make web links to pasted things. If it were to disappear, a
lot of links would break very permanently because Heritrix has no way
to index and crawl the site.

Yah.


Andrei


Reply via email to