Re: [cellml-discussion] Time for a decent RDF library in the CellMLAPI? - part 2

Matt Mon, 11 Sep 2006 16:01:39 -0700

Reply to those bits I missed last time.

On 11/09/2006, at 4:02 PM, Andrew Miller wrote:


> David Nickerson wrote:
>> If I understand the discussion so far, I agree with Matt and we  
>> should
>> be looking at a higher level API than providing methods to manipulate
>> the RDF directly.
>>
> Would you be opposed to offering both a generic API and specific ones?
> We currently allow for extension elements to be manipulated, and  
> RDF is
> a much better representation for data outside of our existing
> specifications (because it allows for arbitrary additional information
> to be added to any resource in the model, can tie in information about
> elements from distinct parts of the model, allows for reification to
> annotate an existing arc in a standard way, and works well for meta- 
> data
> which ties together information about more than well CellML  
> document, or
> which is defined as a model supplement in an external document). By
> supporting extension elements, but not supporting extension RDF,
> however, we would encourage people to use plain XML instead of RDF to
> represent types of data we don't know how to represent.
>
>> I also think that while such an API will be very useful we need to
>> ensure that this discussion doesn't hold up the initial release of  
>> PCEnv.
>>
> PCEnv needs to manipulate RDF, so having a good solution here would be
> beneficial. As a temporary measure, I have written application-side  
> code
> which uses the DOM API to create a new RDF/XML document containing the
> contents of all rdf elements. This document is then serialised and put
> into the Mozilla RDF library. The hassle then comes when we want to
> change RDF, because then the RDF/XML has to be serialised back out of
> Mozilla, then the DOM has to be used to delete all RDF:rdf elements at
> the API side, and then the RDF/XML has to be re-parsed at the API  
> side,
> and put as a single block as a child of the model element.
>
> Contrast this with a simple get or change operation.
>
> Of course, if we must do this through serialised RDF, I think the
> current approach (which is to look from a child of an RDF:rdf element
> child of the model, with an RDF:about attribute equal to the  
> cmeta:id of
> the element for which we request the RDF) is completely broken, and I
> would like to drop this from the API, in favour of an operation on the
> model, which returns a single complete RDF/XML document containing all
> of the RDF from the CellML model (with no filtering attempted). We  
> would
> also need an operation to set the RDF, which will strip all RDF out of
> the model, and add the result of parsing the serialised input  
> string as
> a single RDF:rdf child of the model. If we are not going to add  
> full RDF
> support to the CellML API, then this approach is better than the  
> current
> approach, because:
>
> 1) it can be implemented properly for all RDF/XML in the model,  
> without
> making any artificial assumptions about the way the RDF/XML is
> structured, and without putting a full RDF/XML parser into the CellML
> DOM API.
>
> 2) if we don't provide even basic RDF facilities, the only  
> practical way
> for the application to process the RDF (other than by assuming a  
> certain
> serialisation) is to provide a generic RDF/XML parser. If the
> application is expected to do this, it might as well have access to  
> the
> entire RDF graph for the model, instead of just a fragment which,  
> based
> on the way the RDF/XML happened to be expressed before, was a child in
> that element in the RDF tree.
>
>  3) ontology software, and other applications which need to create an
> RDF graph containing RDF spanning multiple models can simply go  
> through
> a list of models, ask the models for their RDF/XML, and aggregate all
> the RDF documents into a single graph, rather than having to ask  
> the API
> for every single variable / component in each model.
>
> Therefore, I think we need to support several use cases:
> 1) The application has its own RDF library, and wants to do all  
> sorts of
> complex queries on it (perhaps using a query language), or wants to
> aggregate graphs across multiple models. In this case, the RDF/XML
> serialisation approach is probably best.
> 2) The application only wants to use standards from the cmeta,
> simulation, and / or graph specifications. In this case, a higher  
> level
> API should be available to them.
> 3) The application wants to access RDF data defined by a newer
> specification, or by a specification which is not used commonly enough
> to warrant inclusion in the CellML API (there is an enormous number of
> things which users might want to annotate about a model, depending on
> what type of research they are doing and so on, many of which will be
> specific to a particular field, and we cannot possibly contemplate  
> them
> all). However, they are happy with a fairly simple RDF API.
>
>> Andre.
>>
>> Matt wrote:
>>
>>> On 6/09/2006, at 4:53 PM, Andrew Miller wrote:
>>>
>>>
>>>> Matt wrote:
>>>>
>>>>> The fact there is no standardized API does not mean we invent our
>>>>> own. There are plenty of RDF implementations around and a huge  
>>>>> amount
>>>>> of overlap between them. I suggest we find that subset that shows
>>>>> reasonable intersection over the most popular rdf libraries and  
>>>>> use
>>>>> that.
>>>>>
>>>> I think you will find that my proposal meets these criteria,  
>>>> because I
>>>> have specified all the very basic RDF operations (as well as some
>>>> necessarily CellML specific ones).
>>>>
>>>>
>>> Yep, I agree that it is a reasonable set. I'd be surprised if any
>>> useful RDF library does not implement them. I don't see why we  
>>> need to.
>>>
> Because lots of applications (website, editors, and so on) don't need
> complex query languages, so they don't need any additional RDF  
> library.
> It therefore makes sense to use the RDF library at the API side,  
> rather
> than burdening applications which this responsibility. Therefore, the
> RDF support in the CellML API will be 'useful' for some, but not all,
> applications.

I still don't understand where the need for this is overwhelming. And  
if I worked from a basis that we provided schema based high level  
interfaces, then the need for offering this ourselves is still vague.


>
> For applications where this is not sufficient to be considered useful,
> the RDF/XML serialisation approach would be a better option.

These situations I imagine would dominate those where the developers  
want to think in terms of RDF.


>>>
>>>
>>>> Also note that the design of the CellML API means that methods for
>>>> accessing RDF are identified by URIs,
>>>>
>>> What do you mean by this?
>>>
>  From CellML_APISPEC.idl:
>
>     /**
>      * The RDF metadata associated with this element. An element  
> must have a
>      * cmeta:id for any RDF to be able to refer to it.
>      * @param type The URN describing the type of RDF metadata.
> Implementations
>      *             are free to add new types by creating new type  
> names
> at URNs
>      *             under their jurisdiction. New URNs under
> http://www.cellml.org
>      *             are reserved for future versions of this  
> specification.
>      * @return The object containing the RDF representation. If no  
> arcs are
>      *         defined, an empty RDF representation is returned. The
> object may
>      *         be cast in an application defined manner depending  
> on the
> type
>      *         returned.
>      * @raises CellMLException if type isn't supported.
>      * All implementations must implement the following types:
>      *             http://www.cellml.org/RDFXML/string
>      *             http://www.cellml.org/RDFXML/DOM
>      */
>     RDFRepresentation getRDFRepresentation(in wstring type)
> raises(CellMLException);
>

Do you have an example of how this is used? I'm not sure exactly what  
the 'types' refer to.


> Note however, that I am proposing moving this from the elements to the
> model, so that RDFRepresentation becomes a representation of the  
> entire
> RDF graph, and the addition of some new (mandatory?) types which are
> more useful.
>
>>>
>>>> so you can have more than one
>>>> (although we wouldn't want to burden this on implementors, so they
>>>> would
>>>> have to be optional. We could have a core, required API, and allow
>>>> make
>>>> better RDF specifications, e.g. providing query language access,
>>>> documented but not required).
>>>>
>>>>> But in saying that, I'm not sure you need to be exposing the
>>>>> RDF through an RDF centric API. The developers of the metadata  
>>>>> editor
>>>>> found it more useful to offer an API that was centered around the
>>>>> kinds of metadata that needed to be supplied - for instance to  
>>>>> add a
>>>>> series of authors, it was much nicer to be able to populate an
>>>>> authors data structure, especially since in the cellml metadata
>>>>> specification there is a strict interpretation of the  
>>>>> underlying RDF
>>>>> data structures - such as bags, lists etc.
>>>>>
>>>>>
>>>> It is certainly worthwhile to offer convenience interfaces  
>>>> specific to
>>>> certain specifications, such as cmeta, the simulation and graph
>>>> specifications.
>>>>
>>> I see these as been the current use cases, and the most important
>>> level at which to address any specific API (not the RDF level).
>>>
> Why can't we have both a specific and general API, if both are useful?

Because there are already implementations that deal with RDF. There  
is as yet no consensus of what would make up a useful RDF API in the  
RDF world and I wouldn't want to be putting a hard to move stake in  
the ground by doing so. I don't want to encourage the use of  
arbitrary RDF where is averts more appropriate discussion and  
thinking between parties involved. The discussion that is going on  
about representing simulation parameters is a useful one. I wouldn't  
like to encourage people to start inventing their own because "it's  
easy" with RDF. I'd like to try and keep metadata a bit more focussed  
on shared schemas and rules at higher levels. That's not saying that  
data outside of CellML should not refer to CellML models using RDF,  
in fact I think that is one of the better usecases for open world  
semantics, since it is only the application domain which needs to  
understand what it is meaning by the statements it makes.


>
>>>
>>>
>>>> However, the problem with this is that there is such a
>>>> large (and continuously growing) set of RDF-based metadata that  
>>>> people
>>>> might want to use, and so they need to be able to access this  
>>>> without
>>>> updating the CellML API to support every specification ever  
>>>> invented.
>>>>
>>> RDF can always be processed by RDF libraries so long as the RDF/XML
>>> fragments are available to load. The lesson learnt with the cellml
>>> metadata editor was that the higher level API that addressed the
>>> necessary and optional but useful metadata requirements were the  
>>> most
>>> relevant interfaces. Our metadata specification is quite strict  
>>> about
>>> the relationship of various RDF schemas for particular annotation
>>> purposes, e.g. the combination of bqs:Person and vcard structures.
>>> I think for the annotation structures that we say are necessary or
>>> useful, that a predefined specification and interface is very  
>>> useful,
>>> especially for people trying to populate specific data structures  
>>> out
>>> of them. If for example we said feel free to use anything inside
>>> bqs:reference, or actually any arbitrary reference schema, then we
>>> run into an increasing number of permutations that one would need to
>>> accommodate in RDF queries or RDF subgraph graph accessors to get at
>>> the same information.
>>>
> I am not saying that people should represent the same information by
> more than one RDF graph, but rather, people should be able to add new
> information which cannot currently be represented in any supported
> specifications. Our specifications do not contemplate every type of  
> data
> that people might find useful (for example, what if I wanted to
> represent detailed, cardiac electrophysiology specific information  
> about
> a model? I would firstly look to see if anyone else has made an  
> existing
> specification capable of holding the information. If not, I could then
> write a specification, and access it using a generic API).

I think that is an assumption that I would not be prepared to  
instigate development of a solution. There is no clarity to it. Also,  
as I mentioned above, I don't like to promote such as easy way out  
for adding more information to the model.

>>> RDF does not itself imply anything goes; I feel energy is better
>>> spent specifying a strict RDF schema and an API that satisfies
>>> interacting with data that conforms to this (and not at the triple
>>> level).
>>>
> RDF is supposed to be an open-world framework. See
> http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-anyone:
>
>
>         "2.2.6 Anyone Can Make Statements About Any Resource
>
> To facilitate operation at Internet scale, RDF is an open-world
> framework that allows anyone to make statements about any resource.
>
> In general, it is not assumed that complete information about any
> resource is available. RDF does not prevent anyone from making
> assertions that are nonsensical or inconsistent with other statements,
> or the world as people see it. Designers of applications that use RDF
> should be aware of this and may design their applications to tolerate
> incomplete or inconsistent sources of information."

I know all this. What I am saying is that it does not imply that this  
is the way we want people to use it in models that are part of the  
global domain. We need to maintain some guidance over extensions that  
provide information relevant to more than just someone's local  
application domain. The database crisis of bio-infomatics (and I say  
crisis as a parrot of people in that area) is because they all come  
up with different schemas for representing similar and overlapping  
information. RDF does not magically remove this problem.


>
> That said, I agree that it is very bad practice to invent a new
> specification when there is an existing one capable of representing  
> the
> desired information (and if the existing one only represents part  
> of the
> information, it is better to extend it by adding new arcs, rather than
> replacing it). However, this does not mean that we should not provide
> the capability to use new types of information which we never  
> contemplated.
>
>>>> Providing an RDF API at the CellML API side is very useful,  
>>>> especially
>>>> when there are multiple consumers of the implementation, because
>>>> you are
>>>> working with the real document, rather than a copy which was
>>>> created at
>>>> some earlier point.
>>>>
>>>>
>>> I think you are assuming too much about the underlying framework
>>> here. Perhaps I am wrong, but are referring to the shared model
>>> through the corba interface?
>>>
> Our API interface is the same whether or not we go through CORBA,  
> so my
> comments will apply any time we are using the API. I am assuming that
> the application only accesses the objects obtained from the API
> implementation through the interface, and doesn't go poking around in
> API implementation private memory (obviously, if you are using  
> CORBA to
> access cross-process or cross-machine API implementations, the  
> operating
> system enforces this, but even if the API resides in the same address
> space, it is incredibly bad practice to break this assumption).
>
> If our API forces us to serialise and parse to work with RDF, we are
> invariably copying data, regardless of how we are using the CellML  
> API.
> Of course, once we have made this copy, it may be easier in some use
> cases than in others to pass the copy around (but if you have  
> created a
> way to pass RDF associated with CellML around internally, you have
> essentially created an informal API, so why not make it official  
> rather
> than rewriting it in every application?).


I understand what you are getting at w.r.t shared RDF model, but I'm  
unclear on the use-cases for this and how this requires us to design  
and implement an RDF API.

>
>>>
>>>
>>>> As I have pointed out, if you try to get serialised RDF/XML out  
>>>> of an
>>>> RDF/XML unaware implementation, you run into all sorts of problems
>>>> with
>>>> getting all the data.
>>>>
>>>> For example, I just had to write code like this in PCEnv:
>>>> function getModelMetadata(model)
>>>> {
>>>>   var el = model.getRDFRepresentation("http://www.cellml.org/ 
>>>> RDFXML/
>>>> DOM").
>>>>     QueryInterface(Components.interfaces.
>>>>                    cellml_api_IRDFXMLDOMRepresentation).data;
>>>>   var od = el.ownerDocument;
>>>>   var rnl =
>>>> od.getElementsByTagNameNS("http://www.w3.org/1999/02/22-rdf-syntax-
>>>> ns#",
>>>> "RDF");
>>>>   var l = rnl.length;
>>>>   var i;
>>>>   var td = od.implementation.createDocument(
>>>>     "http://www.w3.org/1999/02/22-rdf-syntax-ns#";, "rdf:RDF",
>>>> od.doctype);
>>>>   var de = td.documentElement;
>>>>   for (i = 0; i < l; i++)
>>>>   {
>>>>     de.appendChild(td.importNode(rnl.item(i), true));
>>>>   }
>>>>   var rrs = window.context.cellmlBootstrap.serialiseNode(td);
>>>>
>>>>   // Put it into the Mozilla RDF implementation...
>>>>   var p = Components.classes["@mozilla.org/rdf/xml-parser;1"].
>>>>             createInstance(Components.interfaces.nsIRDFXMLParser);
>>>>   var mds =
>>>> Components.classes["@mozilla.org/rdf/datasource;1?name=in-memory-
>>>> datasource"].
>>>>             createInstance(Components.interfaces.nsIRDFDataSource);
>>>>   var modelURI = model.base_uri.asText;
>>>>   var uri = Components.classes["@mozilla.org/network/standard- 
>>>> url;1"].
>>>>             createInstance(Components.interfaces.nsIURI);
>>>>   uri.spec = modelURI;
>>>>   p.parseString(mds, uri, rrs);
>>>>   return mds;
>>>> }
>>>>
>>>> This is bad for several reasons:
>>>> 1) I have to do a lot of work just to support a relatively common
>>>> operation (getting the metadata) properly.
>>>> 2) It requires a lot of communication between the CellML API and
>>>> the user.
>>>> 3) It uses the DOM core to traverse through nodes defined in
>>>> CellML, in
>>>> order to find all the RDF. The CellML API was designed to prevent
>>>> this,
>>>> so this is a violation of the design principles underlying the
>>>> CellML API.
>>>> 4) It makes a copy of the RDF from the model at the Mozilla side,
>>>> which
>>>> could potentially get out of sync.
>>>> 5) Trying to change the model requires even more special logic  
>>>> (e.g.
>>>> would have to write code to explicitly strip out all the rdf:RDF
>>>> elements, serialise the RDF into a document at the Mozilla-side,
>>>> send it
>>>> across to the API side as a string and parse into a document, then
>>>> import the new document element into the model document, and  
>>>> append to
>>>> the model document element).
>>>>
>>>>
>>> I tend to use an XPATH query and copy the fragments into a new
>>> document. I don't find this particularly hard, and for any given
>>> implementation of the CellML API, it's just a single call away for
>>> the user of that API.
>>>
> You then need to get access to the CellML model with something which
> supports XPath. The whole point of the CellML API is to prevent the  
> need
> for direct access to the DOM representation. Since DOM doesn't support
> XPath directly, you would need to implement something which  
> supported it
> but didn't access data except through the DOM (or put the XPath
> API-side, so it was allowed to poke into the internals of the
> implementation). While this sounds like a common thing that there  
> should
> be code for, in practice I have found that everyone invents their own
> mapping from the W3C specification to their language of choice
> (ourselves included) rather than strictly following a mapping such as
> the CORBA mapping (the only exception being in Javascript, where
> everyone uses a fairly consistent IDL => Javascript mapping).
>
> I think it would be far wiser to implement proper RDF support than to
> implement XPath so we can write a better hack to allow us to access  
> the
> RDF. If we don't do this, then even putting this logic onto Model
> instead would beat implementing XPath in terms of usability.

The XPATH was just an example of obtaining the nodes. What you are  
saying so far is there needs to be an interface defined in the CellML  
API that allows someone to get hold of a by reference or by copy  
representation of the RDF in a model in the form of and RDF model. So  
far it suggests to me we are defining some behaviour here and that  
it's up to an application developer to write the adapter for their  
particular RDF library of choice.

>
>>>
>>>
>>>>> There is nothing stopping anyone adding arbitrary RDF using  
>>>>> whatever
>>>>> RDF tool they want.
>>>>>
>>>>>
>>>> Except that RDF/XML is not a nice way to work with RDF, and as I
>>>> showed
>>>> above, serialise/parse creates problems (I could use your same
>>>> argument
>>>> to say that we should work on CellML documents from directly  
>>>> from the
>>>> DOM core API, but that doesn't mean that it would be productive).
>>>>
>>> I wasn't saying that we use RDF/XML to work with RDF. I am saying
>>> anyone adding arbitrary RDF to a CellML model is free to use  
>>> whatever
>>> RDF library implementation they want to access it. They may even  
>>> have
>>> their own schema aware libraries - well, I'd hope so.
>>>
> I'm not saying we should block this use case, but it seems silly to  
> make
> everyone include an RDF parser for common functionality (even if they
> can do it by including a library), especially if that means going
> through an extra serialise => parse => serialise just to make a change
> to the model.

But the model isn't XML in memory. The serialization is just an  
exchange medium. I would imagine someone working at the RDF level  
would load all the RDF into an RDF model and make whatever changes  
they need over the course of the application. When it's reserialised  
again then there may be some rules for breaking up the RDF or placing  
it in another document or whatever, but I don't see the need to  
constant de-serialize and re-serialize each time you want to make a  
change.

>
>>>
>>>>> Specifying and implementing our own RDF API does not make sense  
>>>>> to me
>>>>> at all.
>>>>>
>>>>>
>>>> It makes a lot of sense to me, because it is consistent with the  
>>>> main
>>>> goal of the CellML API, which (according to me, at least) is to
>>>> provide
>>>> easier programmatic access to the contents of CellML documents.
>>>>
>>> Yes, but I am suggesting this is at a higher level than the RDF
>>> level. You might want to check out what they ended up with in the
>>> metadata editor code.
>>>
> I realise a higher level API will be useful for some applications.
> However, I don't think that it is sufficient for all CellML processing
> applications, and I don't believe that exposing a lower level  
> interface
> imposes a significant burden on implementors (on top of the burden
> already imposed by the higher level API).
>

I think it does. I don't see the need to specify and implement any  
RDF API at all even with higher level interfaces. I would probably  
look at making the higher level interfaces through implementing an  
RDF schema aware library that is 'adapted' to each RDF library as is  
required.

cheers
Matt



> Best regards,
> Andrew
>
> _______________________________________________
> cellml-discussion mailing list
> [email protected]
> http://www.cellml.org/mailman/listinfo/cellml-discussion

_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Time for a decent RDF library in the CellMLAPI? - part 2

Reply via email to