I certainly don't disagree with much of that your saying.  I think we  
are looking at the same subject from different perspectives.  My  
statements may be assumptive, but we need to start somewhere with  
discussion if we are to come to common understanding...

On Mar 5, 2008, at 5:36 PM, Stefano Mazzocchi wrote:
>
> I personally think that it's very easy to underestimate the amount of
> energy that goes into "mandating coherence".

I simply want to see DSpace provide an interface that promotes reuse  
of existing values/equivalencies, to be providing a feedback loop  
that works to inform the the producers and maintainers of the content  
about common or popular terminology.  I'm not suggesting it to be a  
"top down mandate", but a "bottom-up" enabling of the user experience  
such that we can "entice" them to take popular suggested values as  
their own personal choice.

> Any dataset is an exercise of data integration, even in the simple  
> case
> of a single dataset with data entry performed by more than one
> individual (not necessarily at the same time!).
>
> "Hal Abelson" and "H. Abelson" and "Harold Abelson" and "Abelson,
> Harold" are all correct forms and unless all the people doing data  
> entry
>   (or data entry oversight, or batch upload, or data integration or  
> data
> conversion + upload) knows precisely which one is the mandated  
> canonical
> (and never makes mistakes!), you're doomed to have multiple forms  
> anyway.

Still, I think the user interfaces for editing metadata in DSpace can  
exert some control over this experience. Especially if, metadata is  
organized more appropriately under the hood and we have more feedback  
loops into the system.

1.) At least having metadata your attempting to "control" broken up  
into statements with URI (as you suggested earlier) opens up the door  
for having equivalencies managed somewhere and stored in Longwell

(I even wonder if these initial uri could be automatically generated,  
for instance, based on checksums of the existing values, establishing  
that in the Item statements generated by DSpace something like...  
<sha:67eaf8ea6b219545fe7a6881f209f28c>  rdf:label "Hal Abelson"   
would be true everytime it showed up as a metadata value in a DSpace  
instance and at least give us a starting point for making equivalencies)

2.) Exposing some sort of web-services (used very loosely) on  
longwell would provide a feedback mechanism that could be used to  
populate suggestion dropdowns or controlled vocabularies in DSpace  
(or entirely different applications).

Or even more-so, inform a UI devoted to producing mappings or  
accepting mapped values as a "correction" to the existing value.

This would provide a user experience where they were more apt to  
select a system provided value over their own text.  Ultimately,  
Longwell (or at least sesame+banache) would become the center of a  
feedback mechanism that would allow users (curators, submitters and/ 
or regular users) that may be making their own statements  
(equivalencies) about the data to inform and feedback to the source.   
And if this can be used to adjust the data at the source... then the  
data source can become cleaner iteratively over time thanks to the  
distributed efforts of its users.

> The natural tendency is to have a 'validation' phase that makes sure
> that only 'clean and coherent' data enters the system. Of course,  
> it is
> naive to think that this would actually work at any reasonable scale
> without an exponentially growing maintenance cost.

Though, no matter whether managed in DSpace or in Longwell, we  
(library operations/ repository maintainers) will still always be  
incurring this cost.

We need to at least attempt to manage the source (given we control  
the application in the first place).  No matter if we are placing the  
mappings downstream in Longwell or upstream in DSpace, we (being  
library operations/ repository maintainers) are still having to do  
the metadata management and are still looking for the tools to assist  
us in doing so.  And we (MIT Libraries) have be expanding our team to  
do this under mandates from library directorship.  Our current  
mandate (as I interpret my role in it) to provide tools that will  
make this more manageable as we expand into curating Digital  
Collections in DSpace and seek to bring in more content from faculty  
and departments.

> (yes, libraries and museum all do that... last time I checked, they  
> were
> all complaining about the cost and inadequacy of their metadata  
> management)

Yes, I think we agree we are talking about the inadequacy of our  
tools and how to improve them.

My challenge is that placing Longwell in front of (or beside) DSpace  
does not "solve" that problem, without a way for the layperson to  
edit the relationships, it not only forces the management of such  
equivalencies back into the hands of the developer again, now no one  
but the developer can maintain them.

Something has to be created that will produce and maintain those  
statements of equivalency.  At least altering the metadata in DSpace  
can be accomplished from a web UI, albeit tediously, one item at a  
time. Its at this intersection of the two platforms that needs  
exposure to the user.  In the DSpace community we need to better  
enable the users to manage their (and others) metadata in DSpace.  I  
think this is a smaller problem to solve than attempting to solve it  
generally and independently for a larger Semantic Web community.

>
> The use of equivalences is, IMO, not a UI hack but an entirely  
> different
> paradigm that 'embraces' diversity and entropic variability as a  
> fact of
> life (think thermodynamics) and doesn't treat it as a problem.
>
> By saying that "Hal Abelson" and "H. Abelson" are actually different
> "labels" of the same "entity", and recording that information  
> alongside
> your data, you are not only correcting today's incoherence but
> preventing future one from repeating as well.

I didn't mean to "belittle" the concept by suggesting it in the  
presentation layer. I very much respect the concept, the capability  
it provides and want to do more with it. I want to see it inform the  
user making that future incoherence.  And if they really mean to be  
making that incoherence, have a system that allows them to be the one  
that defines it and/or clarify it with any equivalency it may have to  
an existing entity.

>
> It's a 'pave the cowpath' approach: see what incoherences exist and  
> deal
> with them, in a way that is reproducible and with information that can
> be shared and repurposed.
>
> Sure it seems easier to write and run a perl script that hits a
> controlled vocabulary, or the MIT directory or a thesaurus or a
> gazetteer... but *that*, IMO, it's the hack, the technological band- 
> aid
> to a deeper, intrinsic, dynamic of data maintenance.
>
> Of course, both approaches can cohexist but be careful about thinking
> that equivalences are useful only when you lack control.... I've found
> out that exercising control is always much harder than it looks, even
> when, on paper, you have plenty.

thanks, I can see how thats true and can use it to make a more  
informed analysis.  As I said at the beginning, we have to start  
hammering out our assumptions somewhere.

This allows me to refine my statements and clarify that I want the  
"mapping of equivalencies" to be usable as a tool to manage my source  
of data, and that sharing them with the world would be under the  
hopes that others would too and that in such a forum, those relations  
could be repurposed upstream in my application as well.  Starting in  
DSpace means that we can at least begin to enable the application to  
support and participate in a semantic web, even it it is initially  
only shared with other DSpace instances.

Cheers,
Mark

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology





_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Reply via email to