On Fri, 5 Oct 2007, James M Snell wrote:
If you rely on a whole bunch of application specific extension elements,
you will not realize any significant benefit from using Atom. If you go
with Atom, find a more general way of encoding your data (e.g. use RDF
for instance).
Regarding the question about tool support, that depends entirely on who
will be using the application. If you're building a tool for a very
specific and limited audience, it's probably not worth the effort. If
you're building a tool for an open audience, and the data from your
application might be used for purposes you hadn't originally intended,
use Atom and eliminate all the app-specific extensions.
That strikes me as sound advice. I saw a quote from Bill de hÓra cited on
Aristotle Pagaltzis' blog (http://plasmasturm.org/log/463/) a while back:
"Any kind of data garden is fair game for AtomPub to rationalize."
In higher education we (faculty,librarians, etc.) are drowning in digital
collections ("one off's") that eventually need to be shared, ported to the
web (e.g. currently on some departmental Filemaker server), preserved,
repurposed, etc. Current library tools (e.g. DSpace) generally hue to the
Dublin Core way of looking at the world, which simply does not fit with
the way faculty want to think of their stuff. The way I see what we've
done in DASe is a slightly more structured form of "tagging", but the tags
here are allowed to have a type (i.e., not just all keywords). I wouldn't
call it a 'specific and limited audience' by any means, but for these
purposes, perhaps so. I am told that ArtStor (the largest vendor
currently for web-based Art & Art History image collections) has now opted
to go with key-value pairs (they too are "harvesting" collections
that have originated in a wide variety of places) rather than "top-down"
metadata schemas. It'll be interesting to see how this all shakes out.
I don't really need/want the complexity of RDF and I certainly do
not want to try to explain such a thing to a
not-particularly-technology-saavy faculty member that I am trying to
persuade NOT to simply build another Filemaker database! The constraints
provided by a flat key-value system has proven quite useful, actually. I
suspect that I will end up establishing some subset of Atom Elements
(title, summary, updated, id) that every collection has as "common"
attributes and simply throw the rest of the collection-specific key-value
pairs into the <content> element as xml (or perhaps xhtml).
-peter keane
daseproject.org
- James
pkeane wrote:
Yup, I am trying to decide if the tool support is enough to justify the
effort. And still I wonder if there is some other potential side
benefit am not seeing. Here, by the way, is a collection as Atom Feed
(with only one item shown). Note that collection owners can declare
their own custom attributes to "map" to Atom Elements if they wish, in
which case they appear in the default Atom namespace, otherwise they are
in the "dase" namespace (standard administrative metadata common to all
collections) or in the collection's own namespace.
Note that there is no hand coding here, just a method on a collection
object, e.g "print $collection->asAtom()".
-pk
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:dase="http://quickdraw.laits.utexas.edu/dase"
xmlns:texpol="http://quickdraw.laits.utexas.edu/dase/texpol_image_collection/1.0"
xml:base="http://quickdraw.laits.utexas.edu/dase/texpol_image_collection/">
<title>Texas Politics Image Collection</title>
<id>http://quickdraw.laits.utexas.edu/dase/texpol_image_collection</id>
<author>
<name/>
</author>
<updated>1969-12-31T18:00:00-06:00</updated>
<entry
xml:base="http://quickdraw.laits.utexas.edu/dase/texpol_image_collection/">
<id>http://quickdraw.laits.utexas.edu/dase/texpol_image_collection/000435205</id>
<updated>1969-12-31T18:00:00-06:00</updated>
<title>Congress Avenue</title>
<summary>photo of Congress Avenue</summary>
<dase:admin_mime_type>image/jpeg</dase:admin_mime_type>
<dase:admin_filename>PICA17902.JPG</dase:admin_filename>
<dase:admin_checksum>c837f0abd05c8b7126b8dac15d510f30</dase:admin_checksum>
<dase:admin_file_size>787705</dase:admin_file_size>
<dase:admin_image_width>1408</dase:admin_image_width>
<dase:admin_upload_date_time>2007-07-18T15:59:19</dase:admin_upload_date_time>
<dase:admin_serial_number>000435205</dase:admin_serial_number>
<dase:admin_image_height>1209</dase:admin_image_height>
<texpol:keyword>Congress Avenue</texpol:keyword>
<texpol:keyword>buildings</texpol:keyword>
<texpol:scratch_pad>/PICA17902.JPG</texpol:scratch_pad>
<texpol:rights_owner>Austin History Center</texpol:rights_owner>
<texpol:rights_status>Use in Texas Politics
content</texpol:rights_status>
<texpol:credit>Photographer: Unknown</texpol:credit>
<texpol:dase_rights>Restricted</texpol:dase_rights>
<texpol:original_filename>PICA17902.JPG</texpol:original_filename>
<texpol:used_in_chapter>executive</texpol:used_in_chapter>
<link length="9701" type="image/jpeg"
rel="http://quickdraw.laits.utexas.edu/dase/media/thumbnail"
href="/media/thumbnail/000435205_100.jpg"/>
<link length="78505" type="image/jpeg"
rel="http://quickdraw.laits.utexas.edu/dase/media/viewitem"
href="/media/viewitem/000435205_400.jpg"/>
<link length="783340" type="image/jpeg"
rel="http://quickdraw.laits.utexas.edu/dase/media/full"
href="/media/full/000435205_3600.jpg"/>
<content
src="http://quickdraw.laits.utexas.edu/dase/texpol_image_collection/media/thumbnail/000435205_100.jpg"
type="image/jpeg"/>
</entry>
</feed>
On Fri, 5 Oct 2007, James M Snell wrote:
Basically, if it's a closed system with specific clients, there likely
will not be any benefit to using Atom. If you wish to enable
interchange and interop with other applications, there will be benefits
to using Atom, if only to leverage the existing tool support.
- James
pkeane wrote:
Yes, it is really nothing more than key-value pairs. I am more
wondering about the possible benefits of Atom than whether this system
works -- I use it for data import/export of the collections and it is
quite easy to create parsers and generators for this format that lets me
move it in and out of the relational databse that the application uses.
The database itself is also quite generic: a "collections" table, and
"items" table, a "values" table and an "attributes" table (each value
has an item_id and and attribute_id). It is important that the data
model be able to grow organically -- as a user adds a new "field" (aka
key or attribute) to describe the items they have, they'll have no
knowledge at all of Atom or Dublin Core or any of that. And it's fine
-- every collection has a unique set of attributes (aka fields or
keys). The composite primary key for attribute is "ascii_id" plus
"collection_id".
The system has been in production and heavily used for a couple years,
and includes 88 collections comprising 300,000 items. The are currently
1358 rows in the attribute table (those are the keys in the key->value
pairs) and 4.5 million rows in the value table. We've had no problems
at all with this current architecture. And yet I wonder what Atom could
do for me as a more standard XML format for data serialization...
thanks!
Peter Keane
daseproject.org
On Sat, 6 Oct 2007, A. Pagaltzis wrote:
* pkeane <[EMAIL PROTECTED]> [2007-10-05 07:00]:
<item serial_number="000435213">
<metadata
ascii_id="admin_checksum">630230b057c511cbee87447960fff02e</metadata>
<metadata ascii_id="admin_filename">62-GT-06.jpg</metadata>
<metadata ascii_id="admin_file_size">318642</metadata>
<metadata ascii_id="admin_image_height">576</metadata>
<metadata ascii_id="admin_image_width">720</metadata>
<metadata ascii_id="admin_mime_type">image/jpeg</metadata>
<metadata ascii_id="admin_serial_number">000435213</metadata>
<metadata
ascii_id="admin_upload_date_time">2007-07-18T15:59:28</metadata>
<metadata ascii_id="credit">Photographer: Unknown</metadata>
<metadata ascii_id="dase_rights">Restricted</metadata>
<metadata ascii_id="description">photo of Ben Barnes while Speaker,
black
and white</metadata>
<metadata ascii_id="keyword">Ben Barnes</metadata>
<metadata ascii_id="keyword">Capitol Building interior</metadata>
<metadata ascii_id="keyword">Lieutenant Governor</metadata>
<metadata ascii_id="keyword">Speaker of the House</metadata>
<metadata ascii_id="original_filename">62-GT-06.jpg</metadata>
<metadata ascii_id="rights_owner">Senate Media Services</metadata>
<metadata ascii_id="rights_status">Use in Texas Politics
content</metadata>
<metadata ascii_id="scratch_pad">/62-GT-06.jpg</metadata>
<metadata ascii_id="title">Ben Barnes</metadata>
<metadata ascii_id="used_in_chapter">none</metadata>
<media_file filename="000435213_800.jpg" size="medium" height="576"
width="720" mime_type="image/jpeg" />
<media_file filename="000435213_100.jpg" size="thumbnail" height="80"
width="100" mime_type="image/jpeg" />
<media_file filename="000435213_640.jpg" size="small" height="480"
width="600" mime_type="image/jpeg" />
</item>
Any thoughts on the benefits of using atom here?
I don?t see the problem. Atom gives you an Entry where you can
put the metadata for a media resource. You have a bunch of
attributes that should be mapped to Atom elements; the rest you
stick into the content, possibly as RDF since your ad-hoc vocab
is more or less along those lines anyway.
I cannot get past the fact that my ultra-generic xml schema is
REALLY easy to deal with
It?s not actually very generic. It?s a very limited vocabulary
that expresses barely any more than a map of key-value pairs. Of
course such a simple data structure is easy to deal with. The
only genericity there is that the keys are arbitrary strings. It
looks easy now because you have to do almost no work up front:
the structure is rigid and the semantics are completely ad-hoc.
It won?t look very easy at all once you have a large dataset with
an inconsistent mess of key names.
Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>