So I post a lot on http://del.icio.us/kragen --- pretty much any web page I think I might want to show somebody in the future. (Maybe I should hook up del.icio.us to kragen-fw.) But a lot of the time I'm reading web pages, I'm not connected to the internet --- was true in Silicon Valley when I was on the train, and it's even more true now in Venezuela.
So I started keeping a text file with new del.icio.us postings in it, which look like this: url: http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx description: Object-Relational Mapping is the Vietnam War of Computer Science, from Ted Neward extended: Why ORM is so hard. 9500 words. tags: programming url: http://www.uwgb.edu/dutchs/PSEUDOSC/WhyAntiInt.htm description: Why is there Anti-Intellectualism?, from Steven Dutch extended: The curiosity and creativity of children is very superficial. Our own culture supports inquiry atypically well, but some people are still very hostile to it --- it threatens their values, it seems a waste of time, or they resent its power. tags: discourse psychology curiosity anti-intellectualism I edit this big text file in Emacs, and the top of the file looks like this: emacs: -*- mode: text; fill-prefix: " "; coding: utf-8; -*- note: You can put any fields at all in this first record; the poster only looks for 'user' and 'password'. Elsewhere it will complain about extra fields but otherwise ignore them. user: kragen password: (omitted from kragen-hacks posting) The fill-prefix causes M-q to do the right thing in these posting fields --- wrap them with a whitespace prefix. This is a little bit tricky, because historically del.icio.us gives you 255 bytes for each field, and no more, and truncates without mercy. This has led to most people not using the "extended" field at all, or at best pasting a fragment of the web page into it, because del.icio.us punishes any careful thought and summarization by arbitrarily throwing away some of your hard work. I think it's gotten better lately, since the Yahoo acquisition, but I haven't experimented to see what the current limit is. After a while I noticed I was writing much better page summaries than the ones actually on the pages, due to working within the 255-byte limit. Still, I thought it would be nice to know whether I'm at 200 bytes or 300 bytes when I'm writing the "extended" field, so I hacked up some elisp to bind to M-q: (defun words-in-field () (interactive) (save-excursion (forward-paragraph) (search-backward-regexp "^[^ \\t:][^:]*:") (forward-char) (let ((start (point))) (forward-paragraph) (shell-command-on-region start (point) "wc")))) (defun words-in-field-newline () (interactive) (words-in-field) (newline)) (defun words-in-field-fill-paragraph () (interactive) (words-in-field) (fill-paragraph nil)) So I M-x local-set-key RET M-q words-in-field-fill-paragraph when I start editing that buffer, and then whenever I refill one of those paragraphs, I see how many bytes are in the field. Lately I've been using a little bookmarklet to set the entry initially, to save me the work of copying and pasting the URL and the title, and as a bonus, it tells me how many words I have selected: javascript:(function(){ var d = document; var newp = d.createElement('pre'); var url_line = 'url: ' + window.location + '\n'; var description_line = 'description: ' + document.title + '\n'; var words = '' + ('' + getSelection()).split(/\s+/).length + ' words.'; var extended_line = 'extended: ' + words + '\n'; newp.appendChild(d.createTextNode(url_line + description_line + extended_line + 'tags: \n')); d.body.insertBefore(newp, d.body.firstChild); })() Generally I shun ";" when writing JavaScript as much as possible (I don't think it aids readability, but then, I program in Python, so I would think that) but it's essential in bookmarklets. One of these days I'll make either the bookmarklet or the elisp reformat descriptions/titles to my standards automatically, but for now I just edit by hand. The source of the information should be cited by their real name, if possible, after the description of the page itself, preceded by ", from ", and any publication that apparently played a role in conveying the information goes after that person's name, with a ", via " --- no publication names up front. So, for example, "ChristianLindholm.com: The Nokia N95 could be a mobile rocket" becomes "The Nokia N95 could be a mobile rocket, from Christian Lindholm" --- if I don't rephrase it to be more literal and less metaphorical, such as "Nokia N95 is an excellent cell phone". So that gives me a big text file, which I keep in CVS, which is currently about 8000 lines long, containing 1153 of the 2663 posts I've currently posted to del.icio.us. It has some advantages over the standard approach --- it's fairly easy to revise the last few things I've posted, e.g. to add tags. Then I use this primitive, imperfect Perl program to post it to del.icio.us using the V1 API, one URL every ten seconds. Note that this program includes yet another RFC-822 parser. #!/usr/bin/perl -w use strict; use HTTP::Request::Common qw(GET POST); use LWP::UserAgent; use Data::Dumper; use Carp; # I had to do this to get this to run on Ubuntu Hoary: # sudo apt-get install libwww-perl # sudo apt-get install libxml-perl # sudo apt-get install libxml-sax-perl sub open_or_die { my ($mode, $file) = @_; croak "no file" unless $file; open my $foo, $mode, $file or die "Can't open $file: $!"; return $foo; } sub read_rec { my ($fh) = @_; local $_; my %rv; my $cf; while (<$fh>) { last if /^\s*$/ and %rv; next if /^\s*$/; if (/^\s+(.*)/) { $rv{$cf} .= " $1"; } elsif (/^(\S+):\s*(.*)$/) { $cf = $1; $rv{$cf} = $2; } else { warn "Couldn't grok line '$_'\n"; } } return undef unless %rv; return \%rv; } my $infile = shift; die "Usage: $0 file" unless $infile; my $infh = open_or_die "<", $infile; my $client = LWP::UserAgent->new(agent => 'local-delicious-sync/1', from => '[EMAIL PROTECTED]'); my ($user, $pass); my $post = read_rec($infh); if ($post->{user} and $post->{password}) { $user = $post->{user}; $pass = $post->{password}; } else { die "First item in $infile must contain user and password, like:\n" . "user: joshua\n" . "password: ch23s4\n"; } my $all = $client->request(GET "https://$user:[EMAIL PROTECTED]/v1/posts/all"); die "Couldn't get current status: " . $all->as_string if not $all->is_success; my $posts = $all->content; { package HrefExtractor; use base qw(XML::SAX::Base); use XML::SAX::ParserFactory; use Data::Dumper; sub start_element { my ($self, $el) = @_; push @{$self->{hrefs}}, $el->{Attributes}{'{}href'}{Value} if $el->{LocalName} eq 'post'; } sub parse_string { my ($class, $string) = @_; my $self = $class->new(); XML::SAX::ParserFactory->parser(Handler => $self)->parse_string($string); return @{$self->{hrefs}}; } } open CRAP, '>/home/kragen/tmp/crap' and print CRAP $posts; close CRAP; my %already_posted_urls = map { ($_ => 1) } HrefExtractor->parse_string($posts); # function: http://del.icio.us/api/posts/add? # &url= url for post # &description= description for post # &extended= extended for post # &tags= space-delimited list of tags # &dt= datestamp for post, format "CCYY-MM-DDThh:mm:ssZ" # makes a post to delicious. # the datestamp requires a LITERAL "T" and "Z" like in ISO8601 at # http://www.cl.cam.ac.uk/~mgk25/iso-time.html. for example: # "1984-09-01T14:21:31Z" sub post { my ($post) = @_; my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime; my $now = sprintf "%04d-%02d-%02dT%02d:%02d:%02dZ", ($year + 1900, $mon + 1, $mday, $hour, $min, $sec); # my $req = POST "http://$user:[EMAIL PROTECTED]/api/posts/add", [ my $req = POST "https://$user:[EMAIL PROTECTED]/v1/posts/add", [ url => $post->{url}, description => $post->{description} || '', extended => $post->{extended} || '', tags => $post->{tags} || '', dt => $now, ]; sleep 10; my $resp = $client->request($req); die $req->as_string . "\n\n" . $resp->as_string if not $resp->is_success; } while (my $post = read_rec($infh)) { if ($post->{url}) { next if $already_posted_urls{$post->{url}}; post($post); print "Posted $post->{description}\n"; } else { warn "Hmm, unknown record: " . Dumper($post); } } I think CommerceNet might own the copyright on the Perl code. So that's how you can use JavaScript, Lisp, RFC-822, XML, Perl, and HTTPS all to post stuff to del.icio.us. One of these days I'll probably write a little web server that runs on localhost to make all of that stuff much simpler, but that's how I do it now.