On Wed, Jun 9, 2010 at 10:04 AM, Aaron Sherman <a...@ajs.com> wrote:
> Has anyone begun to consider what kind of filesystem interface we want
> for things like sftp, Amazon S3, Google Storage and other remote
> storage possibilities? Is there any extant work out there, or should I
> just start spit-balling?

In the absence of anything forthcoming and a totally arbitrary sense
of urgency ;-) here's what I think I should do:

IO::FileSystems (S32) gives us some basics and the Path role also
provides some useful features.

I will start there and build an IO::FileSystems::VFS roughly like:

class IO::VFS is IO::FileSystems {
  ...
  # Session data if applicable
  has IO::VFS::Session $.session;

 # Many methods take a $context which, if supplied
 # will contain back-end specific data such as restart markers
 # or payment model information. I'll probably define
 # a role for the context parameter, but otherwise
 # leave it pretty loose as a back-end specific structure.

  # A simple operation that guarantees a round-trip to the filesystem
  method nop($context?) { ... }

  # list of sub-IO::VFS partitions/buckets/etc
  method targets($context?) { ... }
  method find_target($locator, $context?) { ... }

  # Means of acquiring file-level access through a VFS
  method find($locator, $enc = $.session.encoding, $context?) { ... }
  method glob(Str $matcher, $enc = $.session.encoding, $context?) { ... }

  # Like opening and writing to filehandle, but the operation is totally
  # opaque and might be a single call, senfile or anything else.
  # Note that this doesn't replace $obj.find($path).write(...)
  method put($locator, $data, $enc = $.session.encoding, $context?) { ... }

  # Atomic copy/rename, etc. are logically filesystem operations, even though
  # they might have counterparts at the file level. The distinction being that
  # at the filesystem level I never know nor care what the contents of the
  # file are, I just ask for an operation to be performed on a given path.
  method copy($from, $to, $enc = $.session.encoding, $context?) { ... }
  method rename($from, $to, $enc = $.session.encoding, $context?) { ... }
  method delete($locator, $enc = $.session.encoding, $context?) { ... }

  # service-level ACLs if any
  method acl($locator, $context?) { ... }
}

The general model I imagine would be something like:

  my IO::VFS::S3 $s3 .= new();
  $s3.session.connect($amazonlogininfo);
  my $bucket = $s3.find_target($bucket_name);
  $bucket.put("quote.txt", "Now is the time for all good men...\n");
  say "URI: ", $bucket.find("quote.txt").uri;

or

 my IO::VFS::GoogleStorage $goog .= new();
 $goog.session.connect($googlelogininfo);
 my $bucket = $goog.find_target($bucket_name);
 $bucket.put("quote.txt", "Now is the time for all good men...\n");
 say "URI: ", $bucket.find("quote.txt").uri;

or

 my IO::VFS::SFTP $sftp .= new();
 $sftp.session.connect(:host<storage>, :user<ajs>, :password<iforgotit>);
 my $filesystem = $sftp.find_target("/tmp");
 $filesystem.put("quote.txt", "Now is the time for all good men...\n");
 say "URI: ", $filesystem.find("quote.txt").uri; # using sftp:...

Notice that everything after $obj.session.connect is identical except
for my choice of variable names. In fact, you shouldn't have to worry
about what storage back-end you're using as long as you have a valid
VFS handle. Really path names are the only thing that might trip you
up.

Thoughts?

I think that in order to do this, I'll need the following support
libraries which may or may not exist (I'll be looking into these):

IO::FileSystems
Path
HTTP (requires socket IO, MIME, base64, etc.)
Various crypto libs

I don't intend to provide a finished implementation of any of these
where they don't already exist (I may not even end up with a final
implementation of the VFS layer), but at least I'll get far enough
along that others who want to work on this will have a starting point,
and I'll want to at least have a test that fakes its way all the way
down to creating a remote file on all three services, even if most of
the operations involve passing on blobs of data generated by
equivalent calls in other languages.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs

Reply via email to