Re: Why use Atom?

pkeane Fri, 12 Oct 2007 21:17:17 -0700


On Sat, 13 Oct 2007, Asbjørn Ulsberg wrote:

On Sat, 06 Oct 2007 03:37:16 +0200, pkeane <[EMAIL PROTECTED]> wrote:
  <dase:admin_mime_type>image/jpeg</dase:admin_mime_type>
  <dase:admin_filename>PICA17902.JPG</dase:admin_filename>
  <dase:admin_checksum>c837f0abd05c8b7126b8dac15d510f30</dase:admin_checksum>
  <dase:admin_file_size>787705</dase:admin_file_size>
  <dase:admin_image_width>1408</dase:admin_image_width>
  <dase:admin_upload_date_time>2007-07-18T15:59:19</dase:admin_upload_date_time>
  <dase:admin_serial_number>000435205</dase:admin_serial_number>
  <dase:admin_image_height>1209</dase:admin_image_height>
  <texpol:keyword>Congress Avenue</texpol:keyword>
  <texpol:keyword>buildings</texpol:keyword>
  <texpol:scratch_pad>/PICA17902.JPG</texpol:scratch_pad>
  <texpol:rights_owner>Austin History Center</texpol:rights_owner>
<texpol:rights_status>Use in Texas Politicscontent</texpol:rights_status>
  <texpol:credit>Photographer: Unknown</texpol:credit>
  <texpol:dase_rights>Restricted</texpol:dase_rights>
  <texpol:original_filename>PICA17902.JPG</texpol:original_filename>
  <texpol:used_in_chapter>executive</texpol:used_in_chapter>
If you could document these fields and their usage in the consumingapplication, it would be easier to devise a way of encoding them in a moreAtom-friendly way.
As to your current implementation, why is 'dase:admin_mime_type' needed whenyou have 'atom:content/@type'? Can't 'texpol:rights_owner','texpol:rights_status' and 'texpol:dase_rights' be implemented with'atom:rights' or '[EMAIL PROTECTED]'copyright']' (or both)?
Can't 'texpol:original_filename' be an 'atom:link' with an appropriate '@rel'and a working (dereferencable) '@href'? The same goes for'texpol:used_in_chapter', 'dase:admin_filename' and 'texpol:scratch_pad'. I'msure Dublin Core can help out with proper relationships here.

I am not sure that I described my situation very effectively in myoriginal post. The DASe system, as we have it running at UT Austin, iswhat I'd call a "Data First" application (see Stefano Mazzocchi's"Data First vs. Stucture First"http://www.betaversion.org/~stefano/linotype/news/93/). What the RDFfolks are thinking is more akin to this system: it is infinitelyextensible, simply by the fact that any collection manager (we have 88collections, comprising 1000+ 'fields' -- we call them attributes) cancreate a new attribute. All of the above prefixed 'tex_pol' areattributes that the folks managing the "Texas Politics Image Collection"created themselves -- they possess semantics *internal* to that collectionand in almost every use case I have seen in 3+ years, that works fine.The application itself has a namespace as well (dase: in the example).This holds immutable administrative metadata captured/created when a newimage (or other digital asset) is created. This is handy for all kinds ofapplication-specific housecleaning.

Applying externally recognized sematics here serves no useful purposeuntil/unless the data will be used outside of this system. One example isin the case of RSS feeds. I have given collection managers the ability tocreate "mappings" between their attributes and a set of Atom attributes(title, summary, rights, etc.), and the system iteself is 'smart' enoughto map the obvious ones from the set of administrative attributes. Thuswe can easily offer RSS/Atom feeds for collections or subsets thereof.Other mappings are available as well -- I have built a proof-of-conceptOAI-PMH publisher that uses an xml-described mapping to make the dataavailble using Dublin Core, for instance.

The metadata that we hold in what is essentially key-value pairs is the"raw" semantic stuff -- this is what the users, typically a faculty memberwith a bunch of images that they have "cataloged" in a FileMaker databaseor Excel spreadsheet that they designed to fit their own specific need.Our goals was to get this stuff up on the web and the only way to makethat happen was to keep the original attributes they had created and use avery generic approach -- the system ONLY thinks in term of key->valuepairs. *Predictions of Doom* from all corners -- librarians anddevelopers alike -- were, thankfully, not borne out. It has been aresounding success and folks are adding new collections and assets daily.

Some subset of the collections are searchable/viewable and some remainprivate but they now ALL live in this system (rather than on a computerin a departmental office somewhere) and can be managed, preserved,repurposed, etc. quite easily.

All that said, I had (have) a hunch that deeper use of Atom and esp.AtomPub could make the system even more open and flexible. Uploadinghappens in one of two ways: 1) the user uploads items one at a timethrough a web interface or 2) the user FTPs the assets to our servers andI do a batch upload that creates all of the proper admin metadata on thefly. I am also experimenting with a system whereby the user puts theimages on a web doc root somewhere and the system grabs them over HTTP(thus a previous questions abut how to create an Atom document describingthe contents of a filesystem directory).

I have had an interesting few days reading the archives for this list andwas particularly interested in the debates about RDF (whether Atom shouldbe RDF like RSS 1.0) and whether Atom should be an abstract data modelthat Atom XML was simply one representation of. For my part, I think theAtom folks hit the sweet spot on all counts. But Atom cannot beeverything to everyone and perhaps it is unrealistic of me to hope that itis the ONE TRUE data format that can contain my data and provide me withall of those tools, not to mention constraints that I need, to be able tomake a flexible and extensible system. I am looking with great interestat systems like Dynamo (Amazon) and CouchDB since I think on some levelthe problem domain is quite similar (well, not so much the massivescalability, but rather the flexible 'data first' approach to metadata).


-Peter Keane
daseproject.org

If you can use the Dublin Core elements directly, do that. I'm thinking that'dase:admin_upload_date_time' has the same semantics as 'DC: Created';<http://dublincore.org/documents/dcmi-terms/#created> and thus can safely bereplaced with that element.
'texpol:keyword' - can't 'atom:category' be used here?
When excersising all of the changes I propose above (assuming all of them canbe done successfully), you end up with a lot less extension elements and amuch more interoperable format. The stuff left is really just metadata aboutthe physical properties of the image (height, width, size) that can bestuffed into one extension element.
--
Asbjørn Ulsberg          -=|=-         [EMAIL PROTECTED]
«He's a loathsome offensive brute, yet I can't look away»

Re: Why use Atom?

Reply via email to