[MCN-L] metadata for dummies

Tim Au Yeung Sun, 04 Jan 1970 20:22:05 -0000

Hi Perian,

Here's my two bits about metadata -- not as terribly informed as I ought 
to be but hopefully that will mean that it's more simplistic as a result.

First off, metadata isn't a term originating from the cultural heritage 
community so its usage within the library, museum and archives 
communities is almost grafted in and misunderstandings abound. In fact, 
it comes from the database science community (I think Bo Sundgren coined 
the term but I'm not entirely certain at this point) where it makes the 
most sense. Simply put, it is data about data which, as explanations go, 
does not explain a lot. However, within the context of databases, it 
does make a lot of sense. Since databases are really just big tables of 
information, what you can do with a specific row and column is 
determined by what you know of that particular row or column. Consider a 
column in a database called "registration date". If a computer is 
programmed to recognize any value within that column as a piece of text, 
then there's not a lot you can do with it. But if that same value is 
number, suddenly you can do a lot more. For instance, you could 
determine whether a value fell within a range of values. So identifying 
that a column in a database is a column of integer values is an example 
of metadata.

In the cultural heritage community, it has been applied to a number of 
areas, some of which existed pre-metadata in the cultural heritage 
community. Take the library community for example -- most library people 
simply assume that the majority of metadata applied in this new era of 
digital objects is what already existed as cataloging. The traditional 
library catalogue entry would not contain information about who owned 
the book prior to the library however -- even if the book came from 
special collections. Museums, on the other, would have great interest in 
the object's provenance. The result is that each domain typically 
assumes that what it considers important about an object represents the 
entirety of what should be known about an object. In the connected 
world, this view is changing and hence, the use of metadata as a catch 
all term that covers all pieces of data that describes a digital object, 
regardless of the domain that it comes from.

Broadly, an object can have information about what it is (descriptive 
metadata), how it's made (technical metadata), the process used to make 
it (administrative metadata), what it's made up of (structural 
metadata), what you need to know to protect it (preservation metadata) 
and how to use it (audience metadata, behavioral metadata, intent 
metadata). Within each of these categories of metadata, there are rules 
for the specific attributes of an object. Dublin core, VRA, MIX, MARC 
are all examples of this -- they provide a list of attributes within the 
context of the category of metadata that can be applied an object (e.g. 
title, creator, description). Generally, these rules do not have a 
specific way they need to be formatted -- MARC being a notable exception 
as its origins can be traced back to the era of the punch card. 
Standards like SGML, XML and specific DTDs/schemas for each of these 
provide the rules for how to format the information into a way that 
machines can use them. There are also rules for how the information 
within a given attribute is formatted and what should be included like 
CCO and AACR2 -- for instance, what to do with the initial article in a 
title or what constitutes a title as opposed to a subtitle. Because we'd 
all like to do things in the same way if possible (so that it's easier 
to cooperate), people will put out lists of standardized values for a 
given attribute so that everyone's working on the same page -- 
LCSH/LCC/DCC for subject headings for instance and things like AAT and 
ULAN. Finally, we want our system to talk to other peoples' systems so 
we need to put together a list of rules for how systems talk -- OAI, 
Z39.50 and SOAP are examples for how systems can bundle everything 
together and send it to another system. Also of note is how those 
bundles are structured -- these represent standards like METS, SCORM and 
MPEG-21 DIDL.

Tim

Perian Sully wrote:
> Hi list of smart people much more knowledgeable than me:
>
> I'm trying to wrap my brain around the technical aspects of metadata 
> sharing and structures, reading though (and not entirely comprehending) 
> a lot of different sources. As I am a visual, hands-on type learner, I'm 
> trying to put everything I'm reading into non-technical language this 
> neophyte can understand. I'm pretty sure I've got #'s 2-4 wrong, but can 
> anyone help me unravel this....?
>
> 1) You have objects. You apply vocabularies to the objects in order to 
> describe them. The vocabularies facilitate how your object information 
> is seen by other computers. Examples of Vocabularies are: AAT, ULAN, 
> Chenhall's
>
> (I understand #1 pretty well. Here's where I start to get lost...)
>
> 2) In order for the other computers to understand what you're giving 
> them, the information needs to be arranged in a specific way. These are 
> the element sets...? these are MARC, LOC, VRA, Dublin Core
>
> 3) Because very few institutions have "pure" collections that fit into 
> one of the Vocabularies, we can use multiple Vocabularies. Do we use 
> multiples of #2 as well? These are defined and plugged into the element 
> sets. They are tagged as belonging to a specific Vocabulary
>
> (I think there's a middle piece in here I'm missing)
>
> 4) There is an umbrella structure, the Harvester, which can read #2 and 
> serve it to the user in readable form. Examples: OAI, MARC (also fits as 
> a #2), XML
>
> So as you can see, I'm dreadfully muddled. I know it's important to 
> understand it, but I'm just not able to wrap my head around the various 
> resources out there. I'm starting to think that Ask A Ninja is more my 
> level...
>
> Help! and thanks in advance
>
>

[MCN-L] metadata for dummies

Reply via email to