Socialtext provides a relatively sane REST API to their Wiki service,
and so I implemented a CVS-style interface that lets you edit Wiki pages
at your leisure and merge them back up to the Wiki server when you feel
like it.

Aside from the usual Web stuff (LWP, URI) this depends on the JSON
module.  I've tested it with version 1.00 of the JSON module from
Debian.

I'll put this up under a free-software license on
http://pobox.com/~kragen/sw/stclient shortly, but I have to sleep first.
For the time being, just notice that it has a notice that it is not in
the public domain.

#!/usr/bin/perl -w
# -*- mode: cperl; encoding: utf-8 -*-
use strict;
use LWP::UserAgent;
use HTTP::Request;
use URI;

# XXX there are a couple of inline modules here that probably should
# be split out into separate files for maintainability, but having
# them inline here makes this program easier to distribute.

=head1 NAME

st - CVS-style interaction with a Socialtext WikiWikiWeb

=head1 SYNOPSIS

 st checkout http://www.socialtext.net/your-wiki-name
 cd your-wiki-name
 vi homepage
 st diff -u
 st update
 st diff -u
 st commit

=head1 DESCRIPTION

C<st> is an example application of Socialtext's REST API.  It checks
out a copy of your Socialtext Wiki into your local filesystem and lets
you edit the text of your Wiki pages with your text editor of choice,
and then upload them again later, possibly merging in changes.

C<st checkout> or C<st co> prompts you for your email address and
Socialtext password, creates the work directory, and downloads all of
the pages in your Wiki.  This takes a while, and won't be practical on
a sufficiently large Wiki, but you only need to do it once.

C<st diff> or C<st di> shows the edits you currently have waiting to
be committed.  It produces unified diff output by default;
C<st diff -c> will give you context diffs instead.  This command works
even when you have no network access, such as when you're on a
commercial flight.

C<st update> or C<st up> updates the pages in your filesystem to the
current version on the Wiki server, trying to automatically merge in
any changes you have made locally.

C<st commit> or C<st ci> uploads any edits you have made to the Wiki
server.

=head1 FILESYSTEM LAYOUT

C<st> creates a directory named after your Socialtext workspace as a
child of the current directory.  Inside that directory, it creates a
.st directory containing the following files:

=over 4

=item C<user>

Your email address.

=item C<pass>

Your Socialtext password.

=item C<url>

The URL of the Wiki.

=item C<pristine>

A directory containing the last versions of all of the pages
downloaded from the server, one per file.

=item C<etags>

A directory containing the entity tags ("ETags") of all of the pages
downloaded from the server, one per file.

=back

=head1 BUGS

=over 4

=item *

Asks you for your username and password even for unparsable URLs.

=item *

Has a lost-update race condition when saving.
L<http://www.socialtext.net/st-rest-docs/index.cgi?http_status_codes>
doesn't list C<412 Precondition Failed> (see
L<http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.13>)
so there probably isn't any way to avoid that.  Should use If-Match
(L<http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.24>)
or If-Unmodified-Since
(L<http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.28>)
in order to try to avoid that.

=item *

Will fail if someone puts a newline in the user or password file using
C<vi>.

=item *

Won't be able to merge if C<diff3> isn't installed.  C<diff3> is an
"essential" package on Debian and has been part of Unix since at least
the 1980s, so this probably won't be a problem on any Unix system.

=item *

Won't be able to show diffs if C<diff> isn't installed.

=item *

No support for tags.

=item *

No support for attachments.

=item *

No support for comments.

=item *

No support for users, breadcrumbs, backlinks.

=item *

Very inefficient at figuring out whether files have been locally
modified.

=item *

Emits useless message apparently when there's a conflict:
C<diff3 exited with 256. at ../st line 442.>

=item *

Doesn't keep you from committing with conflicts.

=back

=head1 COPYRIGHT

Copyright 2008 Kragen Javier Sitaker.

=cut

my $sample_url = 'http://www.socialtext.net/st-rest-docs';

{
  package Dir;
  # "Dir" package to simplify reading and writing a bit.  Does this
  # maybe exist in the standard library yet?
  sub new {
    my ($class, $dir) = @_;
    return bless { dir => $dir }, $class;
  }

  sub create {
    my ($self) = @_;
    mkdir $self->{dir} or die "Can't create directory $self->{dir}: $!";
  }

  # XXX surely this is in the standard Perl library, right?  This
  # version won't work on old MacOS or VMS --- does anyone still care?
  sub _join {
    my ($dir, $filename) = @_;
    return "$dir/$filename";
  }

  sub child {
    my ($self, $filename) = @_;
    return _join($self->{dir}, $filename);
  }

  sub _read_file {
    my ($fn) = @_;
    open my $fh, '<', $fn or die "Couldn't open $fn: $!";
    return do { local $/; <$fh> }
  }

  sub _write_file {
    my ($fn, $content) = @_;
    my $fn_new = "$fn.new.$$";
    eval {
      open my $fh, '>', $fn_new or die "Couldn't open $fn_new: $!";
      print $fh $content or die "Couldn't write to $fn_new: $!";
      close $fh or die "Couldn't write to $fn_new: $!";
      rename $fn_new, $fn or die "Couldn't rename $fn_new to $fn: $!";
    };
    unlink $fn_new;
    die $@ if $@;
  }

  sub get {
    my ($self, $filename) = @_;
    return _read_file($self->child($filename));
  }

  sub set {
    my ($self, $filename, $content) = @_;
    return _write_file($self->child($filename), $content);
  }

  sub has {
    my ($self, $filename) = @_;
    return -e $self->child($filename);
  }

  sub subdir {
    my ($self, $subdirname) = @_;
    return Dir->new($self->child($subdirname));
  }

  sub filenames {
    my ($self) = @_;
    opendir my $dh, $self->{dir};
    return grep { $_ ne '.' and $_ ne '..' } readdir $dh;
  }

  sub pathnames {
    my ($self) = @_;
    return map { $self->child($_) } $self->filenames;
  }
}


{
  # An HTTP response for an x.socialtext-wiki page in the workspace.
  package Page;
  sub new {
    my ($class, $response) = @_;
    return bless { resp => $response }, $class;
  }

  sub content {
    my ($self) = @_;
    return $self->{resp}->content;
  }

  sub etag {
    my ($self) = @_;
    my $etag = $self->{resp}->header('ETag');
    die "No ETag for page: " . $self->content if not defined $etag;
    return $etag;
  }
}


{
  # Encapsulates both the filesystem directory and the HTTP user-agent
  # to talk to the server.
  package Client;
  use JSON qw();

  sub new {
    my ($class, $url, $user, $pass) = @_;
    my $uo = URI->new($url);
    my $local = $uo->path;
    unless ($local =~ [EMAIL PROTECTED]/([^/]+)\Z@) {
      die "Couldn't parse URL $local --- should be something like 
$sample_url\n";
    }
    my $workspace_name = $1;

    my $ua = LWP::UserAgent->new;
    $ua->credentials($uo->host_port, "Socialtext", $user, $pass);

    return bless { 
        url_obj => $uo,
        workspace => $workspace_name,
        user => $user,
        pass => $pass,
        ua => $ua,
    }, $class;
  }

  sub new_from_dir {
    my ($class, $dirname) = @_;
    my $dir = Dir->new($dirname);
    my $conf = $dir->subdir('.st'); # XXX duplication
    my $self = $class->new($conf->get('url'),
                           $conf->get('user'),
                           $conf->get('pass'));
    $self->{dir} = $dir;
    return $self;
  }

  sub conf {
    my ($self) = @_;
    return $self->{dir}->subdir('.st');
  }

  sub pristine {
    my ($self) = @_;
    return $self->conf->subdir('pristine');
  }

  sub etags {
    my ($self) = @_;
    return $self->conf->subdir('etags');
  }

  sub data_url {
    my ($self, $trailing_path) = @_;
    my $uo = $self->{url_obj}->clone;
    $uo->path("/data/workspaces/$self->{workspace}/$trailing_path");
    return $uo;
  }

  sub http_get {
    my ($self, $url, $content_type) = @_;
    my $req = HTTP::Request->new(GET => $url->as_string);
    $req->header(Accept => $content_type);
    my $resp = $self->{ua}->request($req);
    die "HTTP request failure: " . $resp->as_string if $resp->is_error;
    return $resp;
  }

  sub get_json_data {
    my ($self, $trailing_path) = @_;
    my $resp = $self->http_get($self->data_url($trailing_path),
                               'application/json');
    return JSON::jsonToObj($resp->content);
  }

  sub get_page {
    my ($self, $page_id) = @_;
    return Page->new($self->http_get($self->data_url("pages/$page_id"),
                                     'text/x.socialtext-wiki'));
  }


  # Sample page datum from JSON index:
  #           {
  #             'page_uri' => 
'https://www.socialtext.net/kregan-test-space/index.cgi?meeting_agendas',
  #             'page_id' => 'meeting_agendas',
  #             'name' => 'Meeting agendas',
  #             'modified_time' => '1205267435',
  #             'tags' => [
  #                         'Welcome',
  #                         'Recent Changes'
  #                       ],
  #             'uri' => 'meeting_agendas',
  #             'revision_id' => '20080311203035',
  #             'workspace_name' => 'kregan-test-space',
  # # (yeah, somebody misspelled my name)
  #             'last_edit_time' => '2008-03-11 20:30:35 GMT',
  #             'last_editor' => '[EMAIL PROTECTED]',
  #             'revision_count' => '1'
  #           },

  sub list_pages {
    my ($self) = @_;

    return @{$self->get_json_data('pages')};
  }

  sub outdated_files {
    my ($self) = @_;
    my @filenames = $self->etags->filenames;
    my %revision_ids = ();
    foreach my $page ($self->list_pages) {
      $revision_ids{$page->{page_id}} = $page->{revision_id};
    }
    return grep { not $self->etags->has($_) or
                    $self->etags->get($_) ne $revision_ids{$_} }
      keys %revision_ids;
  }

  # Called when a page has been updated from the server.
  sub updated {
    my ($self, $id, $resp) = @_;
    $self->pristine->set($id => $resp->content);
    $self->etags->set($id => $resp->etag);
  }

  # Download and save a page from the server; assumes no local changes
  sub update_page {
    my ($self, $id) = @_;
    print "U $id\n"; # XXX shouldn't this be in "command-line processing"?

    my $resp = $self->get_page($id);
    $self->{dir}->set($id => $resp->content);
    $self->updated($id => $resp);
    return $resp;
  }

  # Sets up a new workspace directory.
  sub create_dir {
    my ($self) = @_;

    $self->{dir} = Dir->new($self->{workspace});
    if (-e $self->{workspace}) {
      die "The subdirectory '$self->{workspace}' already exists here.\n";
    }

    $self->{dir}->create();
    $self->conf->create();
    $self->conf->set(user => $self->{user});
    $self->conf->set(pass => $self->{pass});
    $self->conf->set(url => $self->{url_obj}->as_string);
    $self->pristine->create();
    $self->etags->create();

    for my $page ($self->list_pages) {
      my $resp = $self->update_page($page->{page_id});

      # We insist on the following because it allows us to find out
      # which pages have been updated with a single collection GET
      # request, rather than a HEAD request for each page:
      my $etag = $resp->etag;
      die "ETag for page $page->{page_id} is $etag " .
        "but should be $page->{revision_id}"
        if $etag ne $page->{revision_id};
    }
  }

  # Merges updates from a single file.
  sub merge_updates {
    my ($self, $filename) = @_;
    my $tmpname = ".tmp.$$";
    my $newfile = ".tmp2.$$";
    eval {
      my $page = $self->get_page($filename);
      $self->{dir}->set($tmpname, $page->content); # XXX assumes {dir} means '.'
      open(my $pipe, '-|', 
        'diff3', '-m', $filename,
                       $self->pristine->child($filename),
                       $tmpname) or die "Can't open pipe: $!";
      open my $newfh, '>', $newfile or die "Can't open $newfile: $!";
      while (<$pipe>) {
        print $newfh $_;
      }
      if (not close $pipe) {
        die "popen of diff3 failed: $!" if $! != 0;
        warn "diff3 exited with $?."
      }
      close $newfh or die "Couldn't write to $newfile: $!";
      rename $newfile, $filename;
      $self->updated($filename => $page);
    };
    unlink $tmpname;
    unlink $newfile;
    die $@ if $@;
  }

  # Return a list of locally modified files.
  sub modified_files {
    my ($self) = @_;
    return grep { $self->{dir}->get($_) ne $self->pristine->get($_) } 
      $self->pristine->filenames;
  }

  # Upload a (hopefully-up-to-date) file to the server.
  sub commit_file {
    my ($self, $filename) = @_;

    my $data = $self->{dir}->get($filename); # XXX hope this is a byte string

    my $req = HTTP::Request->new(PUT => $self->data_url("pages/$filename"));
    # XXX the documentation doesn't say the server supports If-Match,
    # but if it gets correct support for it, then we'll get
    # concurrency-safe updates without any further changes to this
    # code.
    $req->header("If-Match" => $self->etags->get($filename));
    $req->header("Content-Type" => "text/x.socialtext-wiki");
    $req->header("Content-Length" => length $data);
    $req->content($data);

    my $response = $self->{ua}->request($req);

    if ($response->is_error) {
      die "Error uploading new version of $filename: " . $response->as_string;
    }

    $self->update_page($filename);
  }
}

# Ask the user for a line of input.
sub ask {
  my ($string) = @_;
  print $string;
  my $result = <STDIN>;
  chomp $result;
  return $result;
}

### User command-line processing.

sub checkout {
  my ($url) = @_;

  die_with_usage() if not $url;

  my $user = ask('Email address you use for your Socialtext account: ');
  my $pass = ask('Socialtext password (will be echoed): ');

  my $client = Client->new($url, $user, $pass);
  $client->create_dir();
}

sub diff {
  my (@flags) = @_;
  @flags = qw(-u) if not @flags; # specify --normal if you're a masochist
  foreach my $filename (Client->new_from_dir('.')->pristine->pathnames) {
    system 'diff', @flags, $filename, '.';
  }
}

sub update {
  my $client = Client->new_from_dir('.');
  foreach my $filename ($client->outdated_files) {
    $client->merge_updates($filename);
    print "M $filename\n";
  }
}

sub commit {
  my $client = Client->new_from_dir('.');
  # XXX need to check for unresolved conflicts
  my @files = $client->outdated_files;
  if (@files) {
    for my $filename (@files) {
      print "M $filename\n";
    }
    die "Bring the above files up-to-date with $0 update before committing.\n";
  }
  foreach my $filename ($client->modified_files) {
    $client->commit_file($filename);
    print "! $filename\n";
  }
}

sub die_with_usage {
  die <<EOF;
$0: usage: one of the following:
  $0 checkout $sample_url
  $0 diff
  $0 update
  $0 commit
EOF
}

my %cmds = (
  co => \&checkout, checkout => \&checkout,
  di => \&diff, diff => \&diff,
  up => \&update, update => \&update,
  ci => \&commit, commit => \&commit,
);

my $cmd = shift @ARGV;
die_with_usage if not $cmd or not $cmds{$cmd};
$cmds{$cmd}->(@ARGV);

Reply via email to