Re: [cellml-discussion] Concerning the CellML Model Repository

Tommy Yu Thu, 21 Jun 2007 16:44:59 -0700

Hi Andrew,

A couple notes:


> I don't think it is a bad thing to have a one-way cache of metadata 
> somewhere for technical / performance reasons (perhaps in a relational 
> database), but I think that we should replicate data for each model 
> (perhaps using a deep copy-on-write approach if this is really necessary 
> to save disk space) rather than changing the metadata for existing 
> models without changing the version.
> 
> Making changes to metadata require changes to the model will ensure that 
> no one gets burned by referencing a particular version of a model, only 
> to find that the metadata in that version has changed on them.
> 
> Your current unversioned, globally shared metadata approach probably 
> also has security implications. For example, lets say that Alice submits 

I understood, and I did call for metadata in the RDBMS to be more of a 
snapshot.  Metadata will still be versioned (revision) in the Subversion 
repository.  The publishing of a model to the public could conceivably be done 
by someone other than the model creator.

Also, in the scenario outlined below, you are correct that a paper referenced 
by PubMed would be treated somewhat differently.  If Charlie were to publish a 
"fake" paper to the repository, it would result in a new references anyway:

Alice - Paper title (original)
Alice, Charlie - Paper title (fake)

There is no way to stop users from entering bad data into the system if they 
were given "admin" rights.  Fortunately Charlie wouldn't have that and so he 
wouldn't be able to add a new author to Alice's paper, but able to only create 
a new fake paper that he did not write since he can publish a model.

On the other hand, if he decide to use the original publication name to publish 
his model, then change the reference there, he would still be prevented from 
doing that, but he has the option to create a new fake reference.  Again, no 
way stopping user from publishing bad data if they were given rights.  It is 
possible to limit where Charlie can publish his paper to (i.e. publishes to 
reviewers only), and there would be no visible damage.

> a model which references a publication. Now suppose that Charlie wasn't 
> an author of that paper, but he wants to add his name onto the list of 
> authors. So he submits a completely different, bogus, model which 
> includes metadata for the publication, and includes his name. When Bob 
> downloads Alice's model from the repository, it would then include 
> Charlie's name as one of the authors (assuming that the publication was 
> referenced by PubMed ID or DOI or some sort of publication URI. 
> Particular cases like the one I described might be able to be secured in 
> an ad hoc fashion such as by checking that the authors are the same, but 
> the general attack will still pervade this type of approach unless 
> metadata is associated uniquely with a particular version of a 
> particular model. If the assertions about the same subject cannot be 
> identified between models in the database, then having data flow back 
> from the relational database into the model does not carry any benefit 
> at all).
> 
> However, I do agree that there is a place for some metadata which can be 
> changed without creating a new version (which probably is the type of 
> metadata that you wouldn't include in the CellML file by default). 
> Curation status and permissions would probably fit in this category, 
> because although they may be associated with a particular version, they 
> should not be immutable for a given version.
> 
> 2) I think that there should be a directory for each mathematical model 
> (which may include several CellML model files, documentation, and so 
> on), so that a particular version can be downloaded / checked out in its 
> entirety (with some directory-level manifest describing how to run or 
> view the model). This suggests that collisions between mathematical 
> models should be prevented at this level, not at the file level. Under 
> this scheme, Mary would find that at usage example 3, she couldn't use 
> the same directory name as the one John already submitted.
> 
> 3) I think the 'reference by citation' needs some expansion: I think 
> people referencing models should have the choice to refer to:
>  => a specific version for which no files will change at all.
>  => the latest version which aims to reflect the letter of a publication 
> (updates will only fix mistakes in the model which prevent it from 
> corresponding to the printed paper).
>  => the latest version which aims to reflect the results obtained by the 
> author (updates can fix discrepancies or omissions from the paper that 
> were in the author's original code, if the author didn't use CellML).
>  => the latest derivative of the current model developed by the same 
> author / group, even if it has not yet been peer-reviewed (subject to 
> permissions constraints).
>  => the latest derivative of the current model, but with all imports 
> external to the model updated to the latest versions (even if this has 
> not been reviewed by the author). This would be the most frequently 
> updated version, because it could be automatically created without the 
> model author being involved.
> 

Perhaps a published model could be marked as stable snapshot(s) and proper 
revision would be downloaded, with another location that points to the trunk.  
I will give these two cases more thought (although I did partially address your 
point 2 via Examples 4 and 5).

Thanks for your inputs,
Tommy.

> It would also be possible to search for derivatives made by other authors.
> 
> 4) I'm not sure that the keywords based URIs are strictly necessary. 
> Perhaps search functionality which links to models is enough for this 
> (which avoids a whole set URI stability issues)?
> 
> Best regards,
> Andrew
> 
>> Cheers,
>> Tommy.
>> _______________________________________________
>> cellml-discussion mailing list
>> [email protected]
>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>>   
> 
> _______________________________________________
> cellml-discussion mailing list
> [email protected]
> http://www.cellml.org/mailman/listinfo/cellml-discussion

_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Concerning the CellML Model Repository

Reply via email to