On 4/20/17 4:43 PM, Matt Morgan wrote: > I guess what I'm asking, is there an easy path from mail-archive.com > search results into a spreadsheet (I guess mySQL or postgres would be > OK too) or some other kind of analysis tool?
I thought about doing this in Node.js but that would require a bit more machinery that isn't available "out of the box" and I didn't want you to get hung up on any dependencies, so, here it is in PHP which should work with just out-of-the-box PHP (on most platforms, anyway): $ php -r ' $dom = new DOMDocument; $dom->loadHTML(file_get_contents("https://www.mail-archive.com/search?l=mcn-l%40mcn.edu&q=%28%2Bjob+OR+%2Bposition%29&f=1")); $doc = simplexml_import_dom($dom); $out = fopen("php://output", "w"); fputcsv($out, array("link", "subject", "date", "name", "message")); $msg = array(); foreach ($doc->body->div[0]->children() as $node) { switch ($node->getName()) { case "h3": $msg["subj"] = (string) $node->span->a; $msg["link"] = "https://www.mail-archive.com" . (string) $node->span->a["href"]; break; case "div": $msg["date"] = (string) $node->span[0]->span->a; $msg["name"] = (string) $node->span[2]->a; break; case "blockquote": $msg["body"] = (string) $node->span->pre; break; case "br": fputcsv($out, array($msg["link"], $msg["subj"], $msg["date"], $msg["name"], $msg["body"])); $msg = array(); break; default: break; } }' | tee msgs.csv HTH, HAND, Dossy -- Dossy Shiobara | "He realized the fastest way to change do...@panoptic.com | is to laugh at your own folly -- then you http://panoptic.com/ | can let go and quickly move on." (p. 70) * WordPress * jQuery * MySQL * Security * Business Continuity * _______________________________________________ Gossip mailing list https://www.mail-archive.com/gossip@mail-archive.com https://www.mail-archive.com/cgi-bin/mailman/options/gossip