Re: Using longwell to manage reciprocity

Stefano Mazzocchi Wed, 05 Mar 2008 12:19:40 -0800

Mark Diggory wrote:
> Longwell Developers,
> 
> Well, the subjects a little "obscure" and I'm not sure I have the  
> right terminology, but heres the question in more detail.


I think the word I would use in this concext is not "reciprocity" but 
'equivalence'.

> In DSpace we can have an Item with an Author name that is for the  
> same person but has multiple variants....
> 
>> Hal Abelson
>> H. Abelson
>> Abelson, H.
>> Abelson, Hal

Right, this is normal in pretty much every dataset, especially when 
aggregated from multiple independent sources.

> The resulting import into Longwell would maintain each of these  
> values separately. (excuse my poor n3 abilities)...
> 
>> <http://hdl.handle.net/1721.1/37585> <dc:contributor> "Hal Abelson"
>> <http://hdl.handle.net/1721.1/38487> <dc:contributor> "H. Abelson"
>> <http://hdl.handle.net/1721.1/38487> <dc:contributor> "Abelson, H."
>> <http://hdl.handle.net/1721.1/37600> <dc:contributor> "Abelson,  
>> Harold"
> 

Right.

> (And certainly they are each valid variants of Hal Abelson's name).   
> I'm concerned that "some of us" out there perceive it to be Longwells  
> current capability that by simply "adding" RDF statements that  
> designate that these are equivalents, then Longwell will magically  
> allow you to have these reduced to an "agreed upon" single value such  
> that (I don't really know n3... I'm making this up as I go)...
> 
>> "Hal Abelson" <owl:SameAs> <xxxx>
>> "H. Abelson" <owl:SameAs> <xxxx>
>> "Abelson, H." <owl:SameAs> <xxxx>
>> "Abelson, Harold" <owl:SameAs> <xxxx>

This is not possible in RDF as it doesn't allow 'literals' to be 
subjects for predicates.

The only way out is to have a URI associated with Hal Abelson and make 
the equivalence statements with that.

For example, if you feed this into Longwell (with the Banach smoosher 
enabled)

<http://hdl.handle.net/1721.1/37585> dc:contributor <urn:23049802934> .
<urn:23049802934> rdf:label "Hal Abelson" .

<http://hdl.handle.net/1721.1/38487> dc:contributor <urn:23487487344> .
<urn:23487487344> rdf:label "H. Abelson" .

<urn:23487487344> owl:sameAs <urn:23049802934>.

what Longwell actually sees is:

<http://hdl.handle.net/1721.1/37585> dc:contributor <urn:23049802934> .
<http://hdl.handle.net/1721.1/38487> dc:contributor <urn:23049802934> .

<urn:23049802934> rdf:label "Hal Abelson","H. Abelson" .

<urn:23487487344> owl:sameAs <urn:23049802934> .


> where
> 
>> <xxxx> <rdfs:label> "Harold Abelson"
> 
> And that by adding these sorts of statements to Longwell (or  
> something "like" them), it will begin replacing those values with  
> that "Label"? That maintaining such mappings in longwell will allow  
> Longwell to magically clean our metadata and reduce duplicate values  
> that occur in the facets.
> 
> I ask this because my interpretation of what you could do with  
> Longwell was that you could develop a Sail for Sesame that was able  
> to "filter" such equivalencies, but you had to know them "long  
> before" you data was actually placed into longwell.  That basically  
> your just "filtering" the data before it gets stored... Not very  
> exciting... why not just do it before you sent longwell the rdf in  
> the first place?

That is entirely possible as well. In fact you can use the Banach 
smoosher (the 'operator' responsible for the graph tranformation above) 
works from the command line as well.

> Any clarification on Longwell's "actual capabilities" in this area  
> would seriously assist us in evaluating it a valid tool to base a  
> discovery UI on for DSpace and reduce any misconception that I feel  
> going on in our MIT Libraries group.  My concern is that Longwell is  
> being perceived as a mechanism to "cleanup" presentation of Metadata,  
> where I see its actual behavior to be more based on the old premise  
> of "garbage in, garbage out".  My analysis needs to determine if our  
> group is actually realistic in its expectations of Longwell's  
> capability and correct those viewpoints if it is not actually the case.

There are pros and cons about 'pre-massaging' data: having longwell 
perform the equivalences allows you to add such equivalences at runtime 
and, eventually, store enough information to be able to roll back and 
return to a previous state.

If this usecase is not necessary, then I agree that it's probably easier 
to 'cleanup' the data up front, using either the banach smoosher or even 
just another Banach operator that is written specifically for that 
purpose (Banach is a general purpose RDF transformer, sort of a pipeline 
for RDF processing).

-- 
Stefano Mazzocchi
Digital Libraries Research Group                 Research Scientist
Massachusetts Institute of Technology
E25-131, 77 Massachusetts Ave               skype: stefanomazzocchi
Cambridge, MA  02139-4307, USA         email: stefanom at mit . edu
-------------------------------------------------------------------

_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: Using longwell to manage reciprocity

Reply via email to