On Sat, 13 Oct 2007, Asbjørn Ulsberg wrote:

On Sat, 06 Oct 2007 03:37:16 +0200, pkeane <[EMAIL PROTECTED]> wrote:

  <dase:admin_mime_type>image/jpeg</dase:admin_mime_type>
  <dase:admin_filename>PICA17902.JPG</dase:admin_filename>
  <dase:admin_checksum>c837f0abd05c8b7126b8dac15d510f30</dase:admin_checksum>
  <dase:admin_file_size>787705</dase:admin_file_size>
  <dase:admin_image_width>1408</dase:admin_image_width>
  <dase:admin_upload_date_time>2007-07-18T15:59:19</dase:admin_upload_date_time>
  <dase:admin_serial_number>000435205</dase:admin_serial_number>
  <dase:admin_image_height>1209</dase:admin_image_height>
  <texpol:keyword>Congress Avenue</texpol:keyword>
  <texpol:keyword>buildings</texpol:keyword>
  <texpol:scratch_pad>/PICA17902.JPG</texpol:scratch_pad>
  <texpol:rights_owner>Austin History Center</texpol:rights_owner>
<texpol:rights_status>Use in Texas Politics content</texpol:rights_status>
  <texpol:credit>Photographer: Unknown</texpol:credit>
  <texpol:dase_rights>Restricted</texpol:dase_rights>
  <texpol:original_filename>PICA17902.JPG</texpol:original_filename>
  <texpol:used_in_chapter>executive</texpol:used_in_chapter>

If you could document these fields and their usage in the consuming application, it would be easier to devise a way of encoding them in a more Atom-friendly way.

As to your current implementation, why is 'dase:admin_mime_type' needed when you have 'atom:content/@type'? Can't 'texpol:rights_owner', 'texpol:rights_status' and 'texpol:dase_rights' be implemented with 'atom:rights' or '[EMAIL PROTECTED]'copyright']' (or both)?

Can't 'texpol:original_filename' be an 'atom:link' with an appropriate '@rel' and a working (dereferencable) '@href'? The same goes for 'texpol:used_in_chapter', 'dase:admin_filename' and 'texpol:scratch_pad'. I'm sure Dublin Core can help out with proper relationships here.


I am not sure that I described my situation very effectively in my original post. The DASe system, as we have it running at UT Austin, is what I'd call a "Data First" application (see Stefano Mazzocchi's "Data First vs. Stucture First" http://www.betaversion.org/~stefano/linotype/news/93/). What the RDF folks are thinking is more akin to this system: it is infinitely extensible, simply by the fact that any collection manager (we have 88 collections, comprising 1000+ 'fields' -- we call them attributes) can create a new attribute. All of the above prefixed 'tex_pol' are attributes that the folks managing the "Texas Politics Image Collection" created themselves -- they possess semantics *internal* to that collection and in almost every use case I have seen in 3+ years, that works fine. The application itself has a namespace as well (dase: in the example). This holds immutable administrative metadata captured/created when a new image (or other digital asset) is created. This is handy for all kinds of application-specific housecleaning.

Applying externally recognized sematics here serves no useful purpose until/unless the data will be used outside of this system. One example is in the case of RSS feeds. I have given collection managers the ability to create "mappings" between their attributes and a set of Atom attributes (title, summary, rights, etc.), and the system iteself is 'smart' enough to map the obvious ones from the set of administrative attributes. Thus we can easily offer RSS/Atom feeds for collections or subsets thereof. Other mappings are available as well -- I have built a proof-of-concept OAI-PMH publisher that uses an xml-described mapping to make the data availble using Dublin Core, for instance.

The metadata that we hold in what is essentially key-value pairs is the "raw" semantic stuff -- this is what the users, typically a faculty member with a bunch of images that they have "cataloged" in a FileMaker database or Excel spreadsheet that they designed to fit their own specific need. Our goals was to get this stuff up on the web and the only way to make that happen was to keep the original attributes they had created and use a very generic approach -- the system ONLY thinks in term of key->value pairs. *Predictions of Doom* from all corners -- librarians and developers alike -- were, thankfully, not borne out. It has been a resounding success and folks are adding new collections and assets daily.

Some subset of the collections are searchable/viewable and some remain private but they now ALL live in this system (rather than on a computer in a departmental office somewhere) and can be managed, preserved, repurposed, etc. quite easily.

All that said, I had (have) a hunch that deeper use of Atom and esp. AtomPub could make the system even more open and flexible. Uploading happens in one of two ways: 1) the user uploads items one at a time through a web interface or 2) the user FTPs the assets to our servers and I do a batch upload that creates all of the proper admin metadata on the fly. I am also experimenting with a system whereby the user puts the images on a web doc root somewhere and the system grabs them over HTTP (thus a previous questions abut how to create an Atom document describing the contents of a filesystem directory).

I have had an interesting few days reading the archives for this list and was particularly interested in the debates about RDF (whether Atom should be RDF like RSS 1.0) and whether Atom should be an abstract data model that Atom XML was simply one representation of. For my part, I think the Atom folks hit the sweet spot on all counts. But Atom cannot be everything to everyone and perhaps it is unrealistic of me to hope that it is the ONE TRUE data format that can contain my data and provide me with all of those tools, not to mention constraints that I need, to be able to make a flexible and extensible system. I am looking with great interest at systems like Dynamo (Amazon) and CouchDB since I think on some level the problem domain is quite similar (well, not so much the massive scalability, but rather the flexible 'data first' approach to metadata).

-Peter Keane
daseproject.org


If you can use the Dublin Core elements directly, do that. I'm thinking that 'dase:admin_upload_date_time' has the same semantics as 'DC: Created'; <http://dublincore.org/documents/dcmi-terms/#created> and thus can safely be replaced with that element.

'texpol:keyword' - can't 'atom:category' be used here?

When excersising all of the changes I propose above (assuming all of them can be done successfully), you end up with a lot less extension elements and a much more interoperable format. The stuff left is really just metadata about the physical properties of the image (height, width, size) that can be stuffed into one extension element.

--
Asbjørn Ulsberg          -=|=-         [EMAIL PROTECTED]
«He's a loathsome offensive brute, yet I can't look away»

Reply via email to