Re: Using longwell to manage reciprocity

Stefano Mazzocchi Wed, 05 Mar 2008 16:50:07 -0800

Mark Diggory wrote:
> Stefano,
> 
> Thanks and my responses are inline below.
> 
> On Mar 5, 2008, at 12:13 PM, Stefano Mazzocchi wrote:
>> Mark Diggory wrote:
>>> (And certainly they are each valid variants of Hal Abelson's name).
>>> I'm concerned that "some of us" out there perceive it to be Longwells
>>> current capability that by simply "adding" RDF statements that
>>> designate that these are equivalents, then Longwell will magically
>>> allow you to have these reduced to an "agreed upon" single value such
>>> that (I don't really know n3... I'm making this up as I go)...
>>>
>>>> "Hal Abelson" <owl:SameAs> <xxxx>
>>>> "H. Abelson" <owl:SameAs> <xxxx>
>>>> "Abelson, H." <owl:SameAs> <xxxx>
>>>> "Abelson, Harold" <owl:SameAs> <xxxx>
>> This is not possible in RDF as it doesn't allow 'literals' to be
>> subjects for predicates.
>>
>> The only way out is to have a URI associated with Hal Abelson and make
>> the equivalence statements with that.
>>
>> For example, if you feed this into Longwell (with the Banach smoosher
>> enabled)
>>
>> <http://hdl.handle.net/1721.1/37585> dc:contributor <urn: 
>> 23049802934> .
>> <urn:23049802934> rdf:label "Hal Abelson" .
>> <http://hdl.handle.net/1721.1/38487> dc:contributor <urn: 
>> 23487487344> .
>> <urn:23487487344> rdf:label "H. Abelson" .
>> <urn:23487487344> owl:sameAs <urn:23049802934>.
>>
>> what Longwell actually sees is:
>>
>> <http://hdl.handle.net/1721.1/37585> dc:contributor <urn: 
>> 23049802934> .
>> <http://hdl.handle.net/1721.1/38487> dc:contributor <urn: 
>> 23049802934> .
>>
>> <urn:23049802934> rdf:label "Hal Abelson","H. Abelson" .
>>
>> <urn:23487487344> owl:sameAs <urn:23049802934> .
> 
> I see there is both Load-time and Query-time rewriting here:
> 
> http://simile.mit.edu/wiki/Banach_Smoosher
> 
> Do I need to have all equivalency statements present at the same time  
> as my statements that are being loaded?


No, equivalences can be loaded at any time, before or after the actual 
statements. There is a performance penalty because of this, but I 
thought that otherwise one would just smoosh the data before loading it.

> I assume this equivalency
> 
>> <urn:23487487344> owl:sameAs <urn:23049802934> .
> 
>   has to be present in the current RDF being fed or can it be present  
> in the store prior to the rdf being fed (Such as below)?

both work fine (if not, it's a bug, but it was designed to be that way 
and banach smooshing tests stress both things).

>> <http://hdl.handle.net/1721.1/37585> dc:contributor <urn: 
>> 23049802934> .
>> <urn:23049802934> rdf:label "Hal Abelson" .
>>
>> <http://hdl.handle.net/1721.1/38487> dc:contributor <urn: 
>> 23487487344> .
>> <urn:23487487344> rdf:label "H. Abelson" .
> 
> 
> In otherwords the Banach Sail cannot post process existing stored  
> Sesame content upon receiving a new equivalency? Correct?

I'm not sure what you mean by "post-process".

> Ultimately I get the sense this means that, to effectively be  
> cleaning the resulting dataset, everything (or at least the  
> statements containing dc:contributors present in the equivalency)  
> would need to be reloaded by Longwell after the mapping was  
> "discovered" and "created" by the "manager of the metadata"? Meaning  
> that in "off the shelf" Longwell, to attain this sort of mapping  
> capability, the entire rdf dataset + equivalencies will still need  
> reloading for the equivalences to be in effect in the resulting  
> stored statements?

The banach smoosher was designed so that you could throw equivalences at 
longwell at any time and it would deal. For example, you could add the 
"hal abelson" equivalence after having loaded all the data in longwell 
and the UI would change accordingly.

>> There are pros and cons about 'pre-massaging' data: having longwell
>> perform the equivalences allows you to add such equivalences at  
>> runtime
>> and, eventually, store enough information to be able to roll back and
>> return to a previous state.
> 
> If state can be "rolled back" on Longwell, then maybe I misinterpret  
> what is actually being stored above. I hope you can clarify this for me?

The best scenario for a 'smooshing triple store' is for it to be able to 
'fold' things together when equivalences are encountered and to 'unfold' 
them when such equivalences are removed (including equivalence chains 
and loops).

Currently, Banach's smooshing capabilities are destructive, meaning that 
it cannot return the graph to the original pre-smooshed state after 
operating on a new fed equivalence.

In order for the graph to return to the original state, it currently 
needs to be discarded and reloaded.

So, in short: longwell can react to any new equivalence added at any 
point in time, before or after the subject and object of the 
equivalences are stored in the triple store. But removing an equivalence 
does not return the graph to the previous state.

>> If this usecase is not necessary, then I agree that it's probably  
>> easier
>> to 'cleanup' the data up front, using either the banach smoosher or  
>> even
>> just another Banach operator that is written specifically for that
>> purpose (Banach is a general purpose RDF transformer, sort of a  
>> pipeline
>> for RDF processing).
> 
> This is interesting and I will look at it further, I've been looking  
> for an RDF transformation pipeline tool.

You'll understand that a "pipeline" (think of SAX-based cocoon pipeline, 
for example) is unfortunately not easy to achieve for a graph-based data 
(while it's relatively straightforward for XML trees).

Banach was my attempt to have something equivalent to cocoon's XML 
transformation pipeline for RDF (where the pipeline stages are called 
'operators').

The concept is based entirely on the outstanding Sesame stackable SAIL 
API (which I merely used) and has a 'load-time' and 'query-time' 
processing capability that is very 'push-pull' and quite different in 
nature from the push-only or pull-only XML APIs, the reason for that is 
that while tree don't need running contexts other than the event call 
stack, graphs do.

It is a surprisingly complex problem to tackle... and unfortunately, too 
many things to do and too little time (as usual) :-)

-- 
Stefano Mazzocchi
Digital Libraries Research Group                 Research Scientist
Massachusetts Institute of Technology
E25-131, 77 Massachusetts Ave               skype: stefanomazzocchi
Cambridge, MA  02139-4307, USA         email: stefanom at mit . edu
-------------------------------------------------------------------

_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: Using longwell to manage reciprocity

Reply via email to