So I post a lot on http://del.icio.us/kragen --- pretty much any web
page I think I might want to show somebody in the future.  (Maybe I
should hook up del.icio.us to kragen-fw.)  But a lot of the time I'm
reading web pages, I'm not connected to the internet --- was true in
Silicon Valley when I was on the train, and it's even more true now in
Venezuela.

So I started keeping a text file with new del.icio.us postings in it,
which look like this:

url: http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
description: Object-Relational Mapping is the Vietnam War of Computer
  Science, from Ted Neward
extended: Why ORM is so hard. 9500 words.
tags: programming

url: http://www.uwgb.edu/dutchs/PSEUDOSC/WhyAntiInt.htm
description: Why is there Anti-Intellectualism?, from Steven Dutch
extended: The curiosity and creativity of children is very
  superficial. Our own culture supports inquiry atypically well, but
  some people are still very hostile to it --- it threatens their
  values, it seems a waste of time, or they resent its power.
tags: discourse psychology curiosity anti-intellectualism

I edit this big text file in Emacs, and the top of the file looks like
this:

emacs: -*- mode: text; fill-prefix: "  "; coding: utf-8; -*-  
note: You can put any fields at all in this first record; the poster
  only looks for 'user' and 'password'.  Elsewhere it will complain
  about extra fields but otherwise ignore them.
user: kragen
password: (omitted from kragen-hacks posting)

The fill-prefix causes M-q to do the right thing in these posting
fields --- wrap them with a whitespace prefix.

This is a little bit tricky, because historically del.icio.us gives
you 255 bytes for each field, and no more, and truncates without
mercy.  This has led to most people not using the "extended" field at
all, or at best pasting a fragment of the web page into it, because
del.icio.us punishes any careful thought and summarization by
arbitrarily throwing away some of your hard work.

I think it's gotten better lately, since the Yahoo acquisition, but I
haven't experimented to see what the current limit is.  After a while
I noticed I was writing much better page summaries than the ones
actually on the pages, due to working within the 255-byte limit.

Still, I thought it would be nice to know whether I'm at 200 bytes or
300 bytes when I'm writing the "extended" field, so I hacked up some
elisp to bind to M-q:

(defun words-in-field ()
  (interactive)
  (save-excursion
    (forward-paragraph)
    (search-backward-regexp "^[^ \\t:][^:]*:")
    (forward-char)
    (let ((start (point)))
      (forward-paragraph)
      (shell-command-on-region start (point) "wc"))))

(defun words-in-field-newline ()
  (interactive)
  (words-in-field)
  (newline))

(defun words-in-field-fill-paragraph ()
  (interactive)
  (words-in-field)
  (fill-paragraph nil))

So I M-x local-set-key RET M-q words-in-field-fill-paragraph when I
start editing that buffer, and then whenever I refill one of those
paragraphs, I see how many bytes are in the field.

Lately I've been using a little bookmarklet to set the entry
initially, to save me the work of copying and pasting the URL and the
title, and as a bonus, it tells me how many words I have selected:

javascript:(function(){
    var d = document;
    var newp = d.createElement('pre');
    var url_line = 'url: ' + window.location + '\n';
    var description_line = 'description: ' + document.title + '\n';
    var words = '' + ('' + getSelection()).split(/\s+/).length + ' words.';
    var extended_line = 'extended: ' + words + '\n';
    newp.appendChild(d.createTextNode(url_line + description_line +
        extended_line + 'tags: \n'));
     d.body.insertBefore(newp, d.body.firstChild);
})()

Generally I shun ";" when writing JavaScript as much as possible (I
don't think it aids readability, but then, I program in Python, so I
would think that) but it's essential in bookmarklets.

One of these days I'll make either the bookmarklet or the elisp
reformat descriptions/titles to my standards automatically, but for
now I just edit by hand.  The source of the information should be
cited by their real name, if possible, after the description of the
page itself, preceded by ", from ", and any publication that
apparently played a role in conveying the information goes after that
person's name, with a ", via " --- no publication names up front.  So,
for example, "ChristianLindholm.com: The Nokia N95 could be a mobile
rocket" becomes "The Nokia N95 could be a mobile rocket, from
Christian Lindholm" --- if I don't rephrase it to be more literal and
less metaphorical, such as "Nokia N95 is an excellent cell phone".

So that gives me a big text file, which I keep in CVS, which is
currently about 8000 lines long, containing 1153 of the 2663 posts
I've currently posted to del.icio.us.  It has some advantages over the
standard approach --- it's fairly easy to revise the last few things
I've posted, e.g. to add tags.

Then I use this primitive, imperfect Perl program to post it to
del.icio.us using the V1 API, one URL every ten seconds.  Note that
this program includes yet another RFC-822 parser.

#!/usr/bin/perl -w
use strict;
use HTTP::Request::Common qw(GET POST);
use LWP::UserAgent;
use Data::Dumper;
use Carp;

# I had to do this to get this to run on Ubuntu Hoary:
#     sudo apt-get install libwww-perl
#     sudo apt-get install libxml-perl
#     sudo apt-get install libxml-sax-perl

sub open_or_die {
  my ($mode, $file) = @_;
  croak "no file" unless $file;
  open my $foo, $mode, $file or die "Can't open $file: $!";
  return $foo;
}

sub read_rec {
  my ($fh) = @_;
  local $_;
  my %rv;
  my $cf;
  while (<$fh>) {
    last if /^\s*$/ and %rv;
    next if /^\s*$/;
    if (/^\s+(.*)/) {
      $rv{$cf} .= " $1";
    } elsif (/^(\S+):\s*(.*)$/) {
      $cf = $1;
      $rv{$cf} = $2;
    } else {
      warn "Couldn't grok line '$_'\n";
    }
  }
  return undef unless %rv;
  return \%rv;
}

my $infile = shift;
die "Usage: $0 file" unless $infile;
my $infh = open_or_die "<", $infile;

my $client = LWP::UserAgent->new(agent => 'local-delicious-sync/1',
                                 from => '[EMAIL PROTECTED]');

my ($user, $pass);
my $post = read_rec($infh);
if ($post->{user} and $post->{password}) {
  $user = $post->{user};
  $pass = $post->{password};
} else {
  die "First item in $infile must contain user and password, like:\n" .
    "user: joshua\n" .
    "password: ch23s4\n";
}

my $all = $client->request(GET "https://$user:[EMAIL PROTECTED]/v1/posts/all");
die "Couldn't get current status: " . $all->as_string if not $all->is_success;
my $posts = $all->content;

{
  package HrefExtractor;
  use base qw(XML::SAX::Base);

  use XML::SAX::ParserFactory;
  use Data::Dumper;

  sub start_element {
    my ($self, $el) = @_;
    push @{$self->{hrefs}}, $el->{Attributes}{'{}href'}{Value}
      if $el->{LocalName} eq 'post';
  }

  sub parse_string {
    my ($class, $string) = @_;
    my $self = $class->new();
    XML::SAX::ParserFactory->parser(Handler => $self)->parse_string($string);
    return @{$self->{hrefs}};
  }
}

open CRAP, '>/home/kragen/tmp/crap' and print CRAP $posts; close CRAP;
my %already_posted_urls = map { ($_ => 1) } HrefExtractor->parse_string($posts);

# function: http://del.icio.us/api/posts/add?
#       &url= url for post
#       &description= description for post
#       &extended= extended for post
#       &tags= space-delimited list of tags
#       &dt= datestamp for post, format "CCYY-MM-DDThh:mm:ssZ"
# makes a post to delicious. 
# the datestamp requires a LITERAL "T" and "Z" like in ISO8601 at
# http://www.cl.cam.ac.uk/~mgk25/iso-time.html. for example:
# "1984-09-01T14:21:31Z"

sub post {
  my ($post) = @_;
  my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime;
  my $now = sprintf "%04d-%02d-%02dT%02d:%02d:%02dZ",
    ($year + 1900, $mon + 1, $mday, $hour, $min, $sec);
#  my $req = POST "http://$user:[EMAIL PROTECTED]/api/posts/add", [
  my $req = POST "https://$user:[EMAIL PROTECTED]/v1/posts/add", [
    url => $post->{url},
    description => $post->{description} || '',
    extended => $post->{extended} || '',
    tags => $post->{tags} || '',
    dt => $now,
  ];
  sleep 10;
  my $resp = $client->request($req);
  die $req->as_string . "\n\n" . $resp->as_string if not $resp->is_success;
}

while (my $post = read_rec($infh)) {
  if ($post->{url}) {
    next if $already_posted_urls{$post->{url}};
    post($post);
    print "Posted $post->{description}\n";
  } else {
    warn "Hmm, unknown record: " . Dumper($post);
  }
}

I think CommerceNet might own the copyright on the Perl code.

So that's how you can use JavaScript, Lisp, RFC-822, XML, Perl, and
HTTPS all to post stuff to del.icio.us.  One of these days I'll
probably write a little web server that runs on localhost to make all
of that stuff much simpler, but that's how I do it now.

Reply via email to