fyi: dataportability.org and "The WRFS (aka WebFS) Inode"

=JeffH Wed, 09 Jan 2008 10:43:04 -0800

Of possible interest..

DataPortability.Public.General
Overview - The WRFS (aka WebFS) Inode
http://groups.google.com/group/dataportability-public/web/WRFS%20-%20Web%20Inode%20Overview


http://dataportability.org/

Web Inode Overview
Web Inode Quick Notes

     * Permission "flags"
     * visible, read, last access
     * Indexes only minimal data, service exists for user, possibly a uri for 
type of service
     * Able to be queried
     * Is pointed to by Attribute Exchange? resides inside AX?
     * Data can be public, private
     * All features are "Opt in", from the start
     * Can point to other web inodes (indirect blocks)
     * Each service / data container has access to change its own key-value 
pair entry, ie, "exists/doesnt exist" { Add, Update, Remove }

The case for an inode structure for data on the web

 From a user's perspective the system should:

     * Allow them to use their data in a {web, desktop} application regardless 
of where the data is stored (in one place, in multiple locations, etc)
     * Not force the user to configure each data store, each application, or 
identity provider for each application. It simply manages the permissions for 
the user, securely, and allows data { images, video, email etc } aggregation 
seamlessly behind the scenes.
     * Keep things simple. Let the user do what the user wants to do, as 
opposed to trying to keep the user in a "walled garden". Let the market decide 
how data should be used.

As developers the system should:

     * Makes all of a user's data accessible from any application on the 
internet while being stored in a variety of containers across many different 
sites.
     * Makes a user's data accessible from a query language or a proxy-api
     * Implicitly knows how to "discover" where a user has data stored.
     * Knows how to aggregate a user's data together across multiple containers 
and identities.
     * Allows the user to control who can see and query what on their behalf
     * Presents a universal data api for web data allowing data to be queried 
against as if it were in a single filesystem or database.

Hey, while we're aiming big, let's go for it -- we might also want it to:

     * Allow an application to view the data from the perspective of a 
filesystem and from the perspective of a database
     * Potentially act as a "Data Cloud" drive on a portable device (example: 
an iPhone thinks its I: drive is a local disk, but its actually the system we 
are proposing)

So exactly what are you saying here?

Basically, the WRFS Inode allows us to view, query, and aggregate a user's 
data, regardless of location (restriction: data must be accessible through a 
webserver) basically in the same way that we do with a local disk based 
filesystem (at least in most ways). Say, how does a filesystem work, anyway? 
That sounds like a good place to go for a start on our model!
Goals:

     * Store
     * Aggregate
     * Protect
     * Relate
     * Query

Our data? Huh. What other systems do these same things?

     * A Filesystem
     * A Database
     * DNS

So really we want to do some things that have already been done quite well in 
computer science. So Let's take a look at how they do it, and build a 
roadmap/model of how we might create an abstraction.
The Filesystem as a Metaphor

What are some interesting properties of a filesystem that are very applicable 
to our situation?

     * A filesystem abstracts away the details of storing bits on a storage 
system from the user or programmer so they can focus on working with files, as, 
well "files". The programmer simply says "I want a stream to read from a file 
called "foo.txt" in my "/usr" directory, go get it and return me a data 
structure or stream". The programmer doesn't worry about inodes, bmap, or the 
size of a disk block (on average), because at the application level of the 
abstraction model those underlying details should simply be "taken care of". 
Just think about this --- if you had to worry about managing free disk blocks 
in a linked list everytime you wanted to open a text file, you might start 
thinking about changing professions. Abstractions are your friend.
     * So exactly what happens when we open a file to read? A file is stored 
all in one spot, right? Not quite. A Hard drive has a disk that is a series of 
"disk blocks", which are all the same size (generally 4KB), which is 
considerably smaller than your average mp3 file. So how does a file get read 
from disk if its store in all those 4KB chunks? It goes roughly like:
          1. --- Translation of filename and directory to inode ---
          2. --- Translation of inode and offset into disk block using the 
"bmap" system call ---
          3. --- file reader/writer is returned at offset in block ---
       Where:
           o Inode - a data structure on disk that represents a file and has 
pointers to all of the disk blocks on disk that contain the actual bytes for 
the file.
     * So now you say "great, you've told us how a basic filesystem works, but 
that doesnt help flesh out this grand distribute file system..." --- and to 
that I say "hold on, lemme finish, I'm going somewhere with this". What 
property of the filesystem can we apply to our design goal abstraction?

       Store and Aggregate.

       We have to be able to store our data in whatever container we want on 
the internet, and we need to be able to aggregate that data back together 
again, right? Well, what if we said:
           o The concept of a disk block could be related to a web server 
itself to satisfy the storage requirement. The system call Bmap() is mapped to 
a standard web api { soap, rest, json } on the webserver for exposing data.
           o The concept of an inode could be related to a data index store to 
satisfy the aggregation requirement. This online store would be needed to tell 
a program/agent "hey, userID X has images in flickr, smugmug, and myspace", 
which would then take those uris and make the proper data queries via the 
standard web    *
           o data apis on the respective data containers. This eliminates the 
need to actually cache someone's data in a third party site, the data should be 
able to be "discovered" and aggregated at runtime. The data aggregation layer 
might do something like take an openID identifier and query a data index store 
(possibly referenced by the openID provider itself) to get the list of data 
containers, query each one, and then return a data structure (or recordset) of 
relevant data stubs+uris for specific user files to the application layer.

[table elided]

The Database as a Metaphor

I think really the aspects of a database that are interesting in this context 
are Protect, Relate, and Query.

     * Protect - The data stubs returned from the data containers might not be 
accessible from all applications or third parties. A database can set who can 
view and update a table down to the record level. Our system might want that 
level of granularity.
     * Relate - The data stubs returned from the data containers might have 
relational properties and have transformations performed against them before 
being returned to the application layer.
     * Query - just like how a SELECT statement is parsed into a query tree 
before being executed against a database table/view, we might want to execute 
similar type functions/filters against the returned aggregated data.

DNS as a Metaphor

DNS allows us to take a domain name and translate it into an IP address; This 
is interesting from the standpoint of our need to resolve an openID-like token 
into a set of data container uris for a given data type { social graph, images, 
videos, ? }

---
end

_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

fyi: dataportability.org and "The WRFS (aka WebFS) Inode"

Reply via email to