Yes, it is really nothing more than key-value pairs. I am more wondering about the possible benefits of Atom than whether this system works -- I use it for data import/export of the collections and it is quite easy to create parsers and generators for this format that lets me move it in and out of the relational databse that the application uses.

The database itself is also quite generic: a "collections" table, and "items" table, a "values" table and an "attributes" table (each value has an item_id and and attribute_id). It is important that the data model be able to grow organically -- as a user adds a new "field" (aka key or attribute) to describe the items they have, they'll have no knowledge at all of Atom or Dublin Core or any of that. And it's fine -- every collection has a unique set of attributes (aka fields or keys). The composite primary key for attribute is "ascii_id" plus "collection_id".

The system has been in production and heavily used for a couple years, and includes 88 collections comprising 300,000 items. The are currently 1358 rows in the attribute table (those are the keys in the key->value pairs) and 4.5 million rows in the value table. We've had no problems at all with this current architecture. And yet I wonder what Atom could do for me as a more standard XML format for data serialization...

thanks!
Peter Keane
daseproject.org




On Sat, 6 Oct 2007, A. Pagaltzis wrote:


* pkeane <[EMAIL PROTECTED]> [2007-10-05 07:00]:
<item serial_number="000435213">
<metadata
ascii_id="admin_checksum">630230b057c511cbee87447960fff02e</metadata>
<metadata ascii_id="admin_filename">62-GT-06.jpg</metadata>
<metadata ascii_id="admin_file_size">318642</metadata>
<metadata ascii_id="admin_image_height">576</metadata>
<metadata ascii_id="admin_image_width">720</metadata>
<metadata ascii_id="admin_mime_type">image/jpeg</metadata>
<metadata ascii_id="admin_serial_number">000435213</metadata>
<metadata ascii_id="admin_upload_date_time">2007-07-18T15:59:28</metadata>
<metadata ascii_id="credit">Photographer: Unknown</metadata>
<metadata ascii_id="dase_rights">Restricted</metadata>
<metadata ascii_id="description">photo of Ben Barnes while Speaker, black
and white</metadata>
<metadata ascii_id="keyword">Ben Barnes</metadata>
<metadata ascii_id="keyword">Capitol Building interior</metadata>
<metadata ascii_id="keyword">Lieutenant Governor</metadata>
<metadata ascii_id="keyword">Speaker of the House</metadata>
<metadata ascii_id="original_filename">62-GT-06.jpg</metadata>
<metadata ascii_id="rights_owner">Senate Media Services</metadata>
<metadata ascii_id="rights_status">Use in Texas Politics content</metadata>
<metadata ascii_id="scratch_pad">/62-GT-06.jpg</metadata>
<metadata ascii_id="title">Ben Barnes</metadata>
<metadata ascii_id="used_in_chapter">none</metadata>
<media_file filename="000435213_800.jpg" size="medium" height="576"
width="720" mime_type="image/jpeg" />
<media_file filename="000435213_100.jpg" size="thumbnail" height="80"
width="100" mime_type="image/jpeg" />
<media_file filename="000435213_640.jpg" size="small" height="480"
width="600" mime_type="image/jpeg" />
</item>

Any thoughts on the benefits of using atom here?

I don?t see the problem. Atom gives you an Entry where you can
put the metadata for a media resource. You have a bunch of
attributes that should be mapped to Atom elements; the rest you
stick into the content, possibly as RDF since your ad-hoc vocab
is more or less along those lines anyway.

I cannot get past the fact that my ultra-generic xml schema is
REALLY easy to deal with

It?s not actually very generic. It?s a very limited vocabulary
that expresses barely any more than a map of key-value pairs. Of
course such a simple data structure is easy to deal with. The
only genericity there is that the keys are arbitrary strings. It
looks easy now because you have to do almost no work up front:
the structure is rigid and the semantics are completely ad-hoc.

It won?t look very easy at all once you have a large dataset with
an inconsistent mess of key names.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>


Reply via email to