On Sat, 13 Oct 2007, Asbjørn Ulsberg wrote:
On Sat, 06 Oct 2007 03:37:16 +0200, pkeane <[EMAIL PROTECTED]> wrote:
<dase:admin_mime_type>image/jpeg</dase:admin_mime_type>
<dase:admin_filename>PICA17902.JPG</dase:admin_filename>
<dase:admin_checksum>c837f0abd05c8b7126b8dac15d510f30</dase:admin_checksum>
<dase:admin_file_size>787705</dase:admin_file_size>
<dase:admin_image_width>1408</dase:admin_image_width>
<dase:admin_upload_date_time>2007-07-18T15:59:19</dase:admin_upload_date_time>
<dase:admin_serial_number>000435205</dase:admin_serial_number>
<dase:admin_image_height>1209</dase:admin_image_height>
<texpol:keyword>Congress Avenue</texpol:keyword>
<texpol:keyword>buildings</texpol:keyword>
<texpol:scratch_pad>/PICA17902.JPG</texpol:scratch_pad>
<texpol:rights_owner>Austin History Center</texpol:rights_owner>
<texpol:rights_status>Use in Texas Politics
content</texpol:rights_status>
<texpol:credit>Photographer: Unknown</texpol:credit>
<texpol:dase_rights>Restricted</texpol:dase_rights>
<texpol:original_filename>PICA17902.JPG</texpol:original_filename>
<texpol:used_in_chapter>executive</texpol:used_in_chapter>
If you could document these fields and their usage in the consuming
application, it would be easier to devise a way of encoding them in a more
Atom-friendly way.
As to your current implementation, why is 'dase:admin_mime_type' needed when
you have 'atom:content/@type'? Can't 'texpol:rights_owner',
'texpol:rights_status' and 'texpol:dase_rights' be implemented with
'atom:rights' or '[EMAIL PROTECTED]'copyright']' (or both)?
Can't 'texpol:original_filename' be an 'atom:link' with an appropriate '@rel'
and a working (dereferencable) '@href'? The same goes for
'texpol:used_in_chapter', 'dase:admin_filename' and 'texpol:scratch_pad'. I'm
sure Dublin Core can help out with proper relationships here.
I am not sure that I described my situation very effectively in my
original post. The DASe system, as we have it running at UT Austin, is
what I'd call a "Data First" application (see Stefano Mazzocchi's
"Data First vs. Stucture First"
http://www.betaversion.org/~stefano/linotype/news/93/). What the RDF
folks are thinking is more akin to this system: it is infinitely
extensible, simply by the fact that any collection manager (we have 88
collections, comprising 1000+ 'fields' -- we call them attributes) can
create a new attribute. All of the above prefixed 'tex_pol' are
attributes that the folks managing the "Texas Politics Image Collection"
created themselves -- they possess semantics *internal* to that collection
and in almost every use case I have seen in 3+ years, that works fine.
The application itself has a namespace as well (dase: in the example).
This holds immutable administrative metadata captured/created when a new
image (or other digital asset) is created. This is handy for all kinds of
application-specific housecleaning.
Applying externally recognized sematics here serves no useful purpose
until/unless the data will be used outside of this system. One example is
in the case of RSS feeds. I have given collection managers the ability to
create "mappings" between their attributes and a set of Atom attributes
(title, summary, rights, etc.), and the system iteself is 'smart' enough
to map the obvious ones from the set of administrative attributes. Thus
we can easily offer RSS/Atom feeds for collections or subsets thereof.
Other mappings are available as well -- I have built a proof-of-concept
OAI-PMH publisher that uses an xml-described mapping to make the data
availble using Dublin Core, for instance.
The metadata that we hold in what is essentially key-value pairs is the
"raw" semantic stuff -- this is what the users, typically a faculty member
with a bunch of images that they have "cataloged" in a FileMaker database
or Excel spreadsheet that they designed to fit their own specific need.
Our goals was to get this stuff up on the web and the only way to make
that happen was to keep the original attributes they had created and use a
very generic approach -- the system ONLY thinks in term of key->value
pairs. *Predictions of Doom* from all corners -- librarians and
developers alike -- were, thankfully, not borne out. It has been a
resounding success and folks are adding new collections and assets daily.
Some subset of the collections are searchable/viewable and some remain
private but they now ALL live in this system (rather than on a computer
in a departmental office somewhere) and can be managed, preserved,
repurposed, etc. quite easily.
All that said, I had (have) a hunch that deeper use of Atom and esp.
AtomPub could make the system even more open and flexible. Uploading
happens in one of two ways: 1) the user uploads items one at a time
through a web interface or 2) the user FTPs the assets to our servers and
I do a batch upload that creates all of the proper admin metadata on the
fly. I am also experimenting with a system whereby the user puts the
images on a web doc root somewhere and the system grabs them over HTTP
(thus a previous questions abut how to create an Atom document describing
the contents of a filesystem directory).
I have had an interesting few days reading the archives for this list and
was particularly interested in the debates about RDF (whether Atom should
be RDF like RSS 1.0) and whether Atom should be an abstract data model
that Atom XML was simply one representation of. For my part, I think the
Atom folks hit the sweet spot on all counts. But Atom cannot be
everything to everyone and perhaps it is unrealistic of me to hope that it
is the ONE TRUE data format that can contain my data and provide me with
all of those tools, not to mention constraints that I need, to be able to
make a flexible and extensible system. I am looking with great interest
at systems like Dynamo (Amazon) and CouchDB since I think on some level
the problem domain is quite similar (well, not so much the massive
scalability, but rather the flexible 'data first' approach to metadata).
-Peter Keane
daseproject.org
If you can use the Dublin Core elements directly, do that. I'm thinking that
'dase:admin_upload_date_time' has the same semantics as 'DC: Created';
<http://dublincore.org/documents/dcmi-terms/#created> and thus can safely be
replaced with that element.
'texpol:keyword' - can't 'atom:category' be used here?
When excersising all of the changes I propose above (assuming all of them can
be done successfully), you end up with a lot less extension elements and a
much more interoperable format. The stuff left is really just metadata about
the physical properties of the image (height, width, size) that can be
stuffed into one extension element.
--
Asbjørn Ulsberg -=|=- [EMAIL PROTECTED]
«He's a loathsome offensive brute, yet I can't look away»