Thanks so much for looking into that Robert.
I was trying to download this information by doing a
LWP::Simple
my $retcode1 = getstore( $second, "$dir/$first" );
on links like this:
https://web.archive.org/web/20131005142948/http://freepages.genealogy.rootsweb.ancestry.com:80/~caulleyfamilyinfo/MissouriMarriages/Franklin18451864BookBConsolidatedIndex.txt
Which gives me a text file similar to the attached HTM file.
That file has a bunch of HTML in it that produces the data in a
text scroll if you open it in a browser. I am embarrassed to
say that all my efforts to obtain the data straight away were
unsuccessful. I expected
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/8.0"); # pretend we are very capable browser
or
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("$0/0.1 " . $ua->agent);
to work, but they download the same HTML file
as the attached one.
I'll probably figure this out someday, but for the moment I am
trying to limit the time I spend on this embarrassing situation :-)
But I just can't help myself. I am still working on it a little.
This is not a lot of data, so I can certainly get it, but I am
more interested in fixing the problem I have obtaining the data
than actually getting the data. I don't really care too much
about the data, but others do.
Mike
On 1/17/2018 1:21 PM, Robert Stone wrote:
(resending this without images due to mailing list size limit)
Greetings,
tl;dr - While the form likely connected to a database/datastore and
there is no way to retrieve that, the wayback machine archived a lot
(but not all) of the data in another format.
*The Bad News*
So for funsies I took a look at this form and the HTML for it. Turns
out that the information entered is POST'ed back to the server at
yearlastwild.asp to handle the request. Just to be absolutely
certain, I went ahead and submitted a request monitoring the network
traffic and confirmed the POST request. That ASP script was likely
connecting to some sort of database to retrieve and then format the
data for presentation.
Just to be SUPER certain there wasn't a whole huge blob of javascript
representing the dataset (which would be incredibly unlikely, but you
never know...) and the largest request is 27.7 KB, and it's for a font.
*The Good News*
*
*
Well, then, let's see if the marriage data is presented in any other
format on the site, like a big huge list. Crazier things have happened...
https://web.archive.org/web/20030208012802/http://vienici.com:80/abmomarr.html
<https://web.archive.org/web/20030208012802/http://vienici.com:80/abmomarr.html>
If we scroll down we can see Washington County and if we select the
He- we can see the same entry for Henry S:
https://web.archive.org/web/20030219131906/http://vienici.com:80/moabs/xmarrwash/xhe-j.html
<https://web.archive.org/web/20030219131906/http://vienici.com:80/moabs/xmarrwash/xhe-j.html>
Which actually matches the data from yearlastwild.asp (although, only
the name and date are contained here and not the description).
So it seems for washing county there is some data and possibly more
from the Washing County GenWeb. I do see for other counties there is
much more data, such as Franklin County:
https://web.archive.org/web/20030407195843/http://www.vienici.com:80/mofran/vB/p201225.html
<https://web.archive.org/web/20030407195843/http://www.vienici.com:80/mofran/vB/p201225.html>
With some work and a whole bunch of parsing you could recreate a good
chunk! Of course, I'd probably hunt high and low to see if someone
else had this dataset I could use (or buy) but nice to know at least
parts of it live on.
Hopefully you find the above helpful.
Best Regards,
Robert Stone
On Tue, Jan 16, 2018 at 8:54 PM, Mike Flannigan <[email protected]
<mailto:[email protected]>> wrote:
This is an archive of a website that went dead in 2011:
https://web.archive.org/web/20090609191130/http://www.vienici.com:80/moabs/lookups.html
<https://web.archive.org/web/20090609191130/http://www.vienici.com:80/moabs/lookups.html>
The 3rd search box (link) takes you to:
https://web.archive.org/web/20090306211924/http://www.vienici.com:80/moabs/yearlastwild.asp
<https://web.archive.org/web/20090306211924/http://www.vienici.com:80/moabs/yearlastwild.asp>
The search does not work on that page, for obvious reasons. I have
looked at
the page source and decided the search was run by javascript, but
I could be
wrong about that. If you are snowed in and have some time to
devote to this,
what I want to know is what format was the marriage license data
in on this
guys server. I don't think that can be told from the page source,
but I thought
I would ask you guys. Perhaps you would need the ASP file to tell
that??
It was not a huge amount of data, so it could have been in almost
any format.
The reason I am asking is because we are trying to find that data
6 years
after the guy died.
I'm pretty sure he had an account at the Wayback Machine, and he
may have stored
the data there, in addition to other places.
Mike
_______________________________________________
Houston mailing list
[email protected] <mailto:[email protected]>
http://mail.pm.org/mailman/listinfo/houston
<http://mail.pm.org/mailman/listinfo/houston>
Website: http://houston.pm.org/
_______________________________________________
Houston mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/houston
Website: http://houston.pm.org/
Title: Wayback Machine
COLLECTED BY
Organization: Alexa Crawls
Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
TIMESTAMPS
|
_______________________________________________
Houston mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/houston
Website: http://houston.pm.org/