Mark Diggory wrote: > Longwell Developers, > > Well, the subjects a little "obscure" and I'm not sure I have the > right terminology, but heres the question in more detail.
I think the word I would use in this concext is not "reciprocity" but 'equivalence'. > In DSpace we can have an Item with an Author name that is for the > same person but has multiple variants.... > >> Hal Abelson >> H. Abelson >> Abelson, H. >> Abelson, Hal Right, this is normal in pretty much every dataset, especially when aggregated from multiple independent sources. > The resulting import into Longwell would maintain each of these > values separately. (excuse my poor n3 abilities)... > >> <http://hdl.handle.net/1721.1/37585> <dc:contributor> "Hal Abelson" >> <http://hdl.handle.net/1721.1/38487> <dc:contributor> "H. Abelson" >> <http://hdl.handle.net/1721.1/38487> <dc:contributor> "Abelson, H." >> <http://hdl.handle.net/1721.1/37600> <dc:contributor> "Abelson, >> Harold" > Right. > (And certainly they are each valid variants of Hal Abelson's name). > I'm concerned that "some of us" out there perceive it to be Longwells > current capability that by simply "adding" RDF statements that > designate that these are equivalents, then Longwell will magically > allow you to have these reduced to an "agreed upon" single value such > that (I don't really know n3... I'm making this up as I go)... > >> "Hal Abelson" <owl:SameAs> <xxxx> >> "H. Abelson" <owl:SameAs> <xxxx> >> "Abelson, H." <owl:SameAs> <xxxx> >> "Abelson, Harold" <owl:SameAs> <xxxx> This is not possible in RDF as it doesn't allow 'literals' to be subjects for predicates. The only way out is to have a URI associated with Hal Abelson and make the equivalence statements with that. For example, if you feed this into Longwell (with the Banach smoosher enabled) <http://hdl.handle.net/1721.1/37585> dc:contributor <urn:23049802934> . <urn:23049802934> rdf:label "Hal Abelson" . <http://hdl.handle.net/1721.1/38487> dc:contributor <urn:23487487344> . <urn:23487487344> rdf:label "H. Abelson" . <urn:23487487344> owl:sameAs <urn:23049802934>. what Longwell actually sees is: <http://hdl.handle.net/1721.1/37585> dc:contributor <urn:23049802934> . <http://hdl.handle.net/1721.1/38487> dc:contributor <urn:23049802934> . <urn:23049802934> rdf:label "Hal Abelson","H. Abelson" . <urn:23487487344> owl:sameAs <urn:23049802934> . > where > >> <xxxx> <rdfs:label> "Harold Abelson" > > And that by adding these sorts of statements to Longwell (or > something "like" them), it will begin replacing those values with > that "Label"? That maintaining such mappings in longwell will allow > Longwell to magically clean our metadata and reduce duplicate values > that occur in the facets. > > I ask this because my interpretation of what you could do with > Longwell was that you could develop a Sail for Sesame that was able > to "filter" such equivalencies, but you had to know them "long > before" you data was actually placed into longwell. That basically > your just "filtering" the data before it gets stored... Not very > exciting... why not just do it before you sent longwell the rdf in > the first place? That is entirely possible as well. In fact you can use the Banach smoosher (the 'operator' responsible for the graph tranformation above) works from the command line as well. > Any clarification on Longwell's "actual capabilities" in this area > would seriously assist us in evaluating it a valid tool to base a > discovery UI on for DSpace and reduce any misconception that I feel > going on in our MIT Libraries group. My concern is that Longwell is > being perceived as a mechanism to "cleanup" presentation of Metadata, > where I see its actual behavior to be more based on the old premise > of "garbage in, garbage out". My analysis needs to determine if our > group is actually realistic in its expectations of Longwell's > capability and correct those viewpoints if it is not actually the case. There are pros and cons about 'pre-massaging' data: having longwell perform the equivalences allows you to add such equivalences at runtime and, eventually, store enough information to be able to roll back and return to a previous state. If this usecase is not necessary, then I agree that it's probably easier to 'cleanup' the data up front, using either the banach smoosher or even just another Banach operator that is written specifically for that purpose (Banach is a general purpose RDF transformer, sort of a pipeline for RDF processing). -- Stefano Mazzocchi Digital Libraries Research Group Research Scientist Massachusetts Institute of Technology E25-131, 77 Massachusetts Ave skype: stefanomazzocchi Cambridge, MA 02139-4307, USA email: stefanom at mit . edu ------------------------------------------------------------------- _______________________________________________ General mailing list [email protected] http://simile.mit.edu/mailman/listinfo/general
