On 4/20/17 4:43 PM, Matt Morgan wrote:
> I guess what I'm asking, is there an easy path from mail-archive.com
> search results into a spreadsheet (I guess mySQL or postgres would be
> OK too) or some other kind of analysis tool?

I thought about doing this in Node.js but that would require a bit more
machinery that isn't available "out of the box" and I didn't want you to
get hung up on any dependencies, so, here it is in PHP which should work
with just out-of-the-box PHP (on most platforms, anyway):

$ php -r '
    $dom = new DOMDocument;
   
$dom->loadHTML(file_get_contents("https://www.mail-archive.com/search?l=mcn-l%40mcn.edu&q=%28%2Bjob+OR+%2Bposition%29&f=1";));
    $doc = simplexml_import_dom($dom);
    $out = fopen("php://output", "w");
    fputcsv($out, array("link", "subject", "date", "name", "message"));
    $msg = array();
    foreach ($doc->body->div[0]->children() as $node) {
        switch ($node->getName()) {
            case "h3":
                $msg["subj"] = (string) $node->span->a;
                $msg["link"] = "https://www.mail-archive.com"; . (string)
$node->span->a["href"];
                break;
            case "div":
                $msg["date"] = (string) $node->span[0]->span->a;
                $msg["name"] = (string) $node->span[2]->a;
                break;
            case "blockquote":
                $msg["body"] = (string) $node->span->pre;
                break;
            case "br":
                fputcsv($out, array($msg["link"], $msg["subj"],
                    $msg["date"], $msg["name"], $msg["body"]));
                $msg = array();
                break;
            default: break;
        }
    }' | tee msgs.csv


HTH, HAND,

Dossy

-- 
Dossy Shiobara         |      "He realized the fastest way to change
do...@panoptic.com     |   is to laugh at your own folly -- then you
http://panoptic.com/   |   can let go and quickly move on." (p. 70) 
  * WordPress * jQuery * MySQL * Security * Business Continuity *


_______________________________________________
Gossip mailing list
https://www.mail-archive.com/gossip@mail-archive.com
https://www.mail-archive.com/cgi-bin/mailman/options/gossip

Reply via email to