Dear Robert, Tom, and Marcus,
I am not sure how I would survive in this complicated world without this ability to ask a quick question of friam and get a quick answer. The problem I so often face is WHAT QUESTION to ask the web, when I plunge into it. I had gotten seduced by the dramatic metaphor of "scrape"; indeed, "migration" is a lot closer to what I am looking for. These tips will help a lot and I will investigate them. Your mention of a web archive brought to mind another thought. Years ago, I did up a website for the "City University of Santa Fe" which I thought was pretty nifty. However, I was the only one who thought it was nifty, so in time even I lost interest. And then I forgot to pay my fee to the hosting service, and they forgot to remind me, and I lost the site's url to some outfit in Indiana. I assumed I had lost the data too, but your email suggests the possibility that it still lives some where. Many, many thanks. Nick Nicholas S. Thompson Emeritus Professor of Psychology and Biology Clark University <http://home.earthlink.net/~nickthompson/naturaldesigns/> http://home.earthlink.net/~nickthompson/naturaldesigns/ From: Friam [mailto:friam-boun...@redfish.com] On Behalf Of Robert J. Cordingley Sent: Wednesday, January 04, 2017 12:00 AM To: The Friday Morning Applied Complexity Coffee Group <friam@redfish.com> Subject: Re: [FRIAM] scraping a web site Hi Nick Your old Earthlink site seems to comprise just about ten 'pages' of content, with many of those pages (Published Works) listing many bibliographic citations, each with a link to an image and further link to a pdf document. Grabbing all the content manually is perhaps tedious but doable. Saving all the pages as HTML is also doable but don't see a lot of point in that. Populating your Research Gate website should be possible too with in browser Copy and Paste - but I'm not familiar with RG - as should any other website builder, Wix, Squarespace, WordPress as well as hosting company website builders. I don't know of an automated system but the Internet Archive must have something and already has multiple captures of past versions of your site - see https://web.archive.org/web/20151206005021/http://home.earthlink.net/~nickth ompson/naturaldesigns/ <https://web.archive.org/web/20151206005021/http:/home.earthlink.net/~nickth ompson/naturaldesigns/> . I think what you're really looking for is a web/content migration tool more so than web scraping tools which tend to be focused on capturing specific data, say contact information. Vamosa seems to offer a service that should do exactly what you want, see http://www.vamosa.com/vamosa-content-migrator-c124 but suspect that's aimed at large corporate clients. I have no experience with them. Googling 'website migration tools' produces lots of results - some questionable. Hope this helps. Thanks, Robert On 1/3/17 9:49 PM, Nick Thompson wrote: Dear Phellow Phriammers, I am in the uncomfortable position of being bound by threads of steel to Earthlink. Many, MANY, years I go I started a website on Earthlink, {http://home.earthlink.net/~nickthompson/naturaldesigns/ <http://home.earthlink.net/%7Enickthompson/naturaldesigns/> }, and put a lot of my writing, and some commentary up on it. The website creation and editing medium (trellix) was pretty good for its time, and there are many ways that I find the site quite satisfying. But gradually Earthlink has withdrawn its support, and now I am not sure I could get in to edit or change it. Meantime, Research Gate has gotten started, and provides a somewhat better place to meet the world and archive my stuff. And also, having the site on earthlink binds me to them and their 22 dollar a month fee. So. . I am wondering if there is a way (or a service that would) scrape the website and, possibly, dump it into a new and more reliable, more website creation medium? Please, ambulatory knowledge only. I don't want a people doing deep searches to answer this question . Thanks, as always . Nick Nicholas S. Thompson Emeritus Professor of Psychology and Biology Clark University http://home.earthlink.net/~nickthompson/naturaldesigns/ <http://home.earthlink.net/%7Enickthompson/naturaldesigns/> ============================================================ FRIAM Applied Complexity Group listserv Meets Fridays 9a-11:30 at cafe at St. John's College to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ by Dr. Strangelove -- Cirrillian Web Design & Development Santa Fe, NM http://cirrillian.com 281-989-6272 (cell) Member Design Corps of Santa Fe
============================================================ FRIAM Applied Complexity Group listserv Meets Fridays 9a-11:30 at cafe at St. John's College to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ by Dr. Strangelove