UK Ontology Network (UKON) 2016 - Last Call for Participation [Deadline: 31st March, 2016]
The Fifth UK Ontology Network meeting (#ukon2016) will take place on Thursday April 14th, 2016 at Newcastle University, Newcastle upon Tyne. The aims of this meeting are as follows: To enable dissemination of ontology relevant work from across multiple disciplines To encourage collaboration and cooperation between different members of UK organisations working in this area To help establish a research agenda in ontology and better communication with funding councils and industry needs The full programme is now available, and we have a fascinating series of talks, with demo and poster sessions. The meeting will also offer plenty of opportunities for networking. http://www.ukontology.org/programme Registration for UKON 2016 is now available. Please register before 31st of March, using the link below: http://www.ukontology.org/registration Some hotels close to the venue offer reduced rates for UKON delegates. You can use the link below to take advantage of special rates: http://www.newcastlegateshead.com/UKON2016 Best wishes, and we look forward to seeing you in Newcastle. Phillip Lord James Malone Goksel Misirli Jennifer Warrender Claire Smith UKON 2016 Organisers
Re: Good, up-to-date tutorial on OWL 2 and Protege for Biomedical domain?
Oliver Ruebenacker cur...@gmail.com writes: The challenge of building ontologies is not technical, but socio-political. I think this very much depends on the ontology that you are creating and what it's purpose is. When we created the karyotype ontology, there was no socio-political challenge at all; the knowledge was all there in the first place. The problem was to create a complex and repetitive ontology consistently, which behaved correctly. This is a technical challenge, which I think we solved. Another problem we are trying to address is linking the axiomatisation through to the documentation and provenance for that axiomatisation; again, a largely technical challenge. There are challenges associated with getting agreement and coming to an consensus, of course, but these are hardly unique to ontology building. Phil
Re: Good, up-to-date tutorial on OWL 2 and Protege for Biomedical domain?
Entirely as an excuse to plug my own work, there is my own. http://homepages.cs.ncl.ac.uk/phillip.lord/take-wing/take_wing.html It doesn't use Protege, but my own tool, and it's not finished, so it's not an exact replacement. Also worth looking at is http://ontogenesis.knowledgeblog.org Also, not a tutorial, but with lots of tutorial information it in. Phil Matthias Samwald matthias.samw...@meduniwien.ac.at writes: Dear all, I'm about to teach a course to medical informatics students that have never used OWL before. Are there any good, up-to-date tutorials or even course materials on OWL 2, biomedical ontology building and Protege that you could recommend? I was surprised to find that most publicly available resources have gathered quite a bit of dust (focused on OWL 1, old versions of Protege), or are not very accessible. I'd be especially interested in materials that avoid using the Pizza ontology ;) Thanks, Matthias -- Phillip Lord, Phone: +44 (0) 191 208 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU
[ANN] Tawny-OWL 1.3.0
I am pleased to annouce the 1.3.0 release of Tawny-OWL, now available on clojars and github (http://github.com/phillord/tawny-owl). What is Tawny-OWL = Tawny-OWL allows construction of OWL ontologies, in a evaluative, functional and fully programmatic environment. Think of it as the ontology engineering equivalent of [R](http://www.r-project.org/). It has many advantages over traditional ontology engineering tools, also described in a [video introduction](https://vimeo.com/89782389). - An interactive shell or REPL to explore and create ontologies. - Source code, with comments, editable using any of a range of IDEs. - Fully extensible -- new syntaxes, new data sources can be added by users - Patterns can be created for individual ontologies; related classes can be built easily, accurately and maintainably. - A unit test framework with fully reasoning. - A clean syntax for versioning with any VCS, integrated with the IDE - Support for packaging, dependency resolution and publication - Enabled continuous integration with both ontology and software dependencies For the Clojure developer = Tawny-OWL is predominately designed as a programmatic application for ontology development, but it can be used as an API. OWL ontologies are a set of statements about things and their relationships; underneath these statements map to a subset of first-order logic which makes it possible to answer questions about these statements using highly-optimised reasoners. Take Wing = Although in it's early stage, a rich manual is now being written for Tawny-OWL https://github.com/phillord/take-wing http://homepages.cs.ncl.ac.uk/phillip.lord/take-wing/take_wing.html Changes === Two new features are added to this release. First, it is now possible to annotate axioms as well as just entities. Second, new functions have been added to make development of patterns easier, both in function and macro form. Full change log is available. https://github.com/phillord/tawny-owl/blob/master/docs/releases.md
[ANN] Tawny-OWL 1.2
I am pleased to annouce the 1.2.0 release of Tawny-OWL, now available on clojars and github (http://github.com/phillord/tawny-owl). What is Tawny-OWL = Tawny-OWL allows construction of OWL ontologies, in a evaluative, functional and fully programmatic environment. Think of it as the ontology engineering equivalent of [R](http://www.r-project.org/). It has many advantages over traditional ontology engineering tools, also described in a [video introduction](https://vimeo.com/89782389). - An interactive shell or REPL to explore and create ontologies. - Source code, with comments, editable using any of a range of IDEs. - Fully extensible -- new syntaxes, new data sources can be added by users - Patterns can be created for individual ontologies; related classes can be built easily, accurately and maintainably. - A unit test framework with fully reasoning. - A clean syntax for versioning with any VCS, integrated with the IDE - Support for packaging, dependency resolution and publication - Enabled continuous integration with both ontology and software dependencies For the Clojure developer = Tawny-OWL is predominately designed as a programmatic application for ontology development, but it can be used as an API. OWL ontologies are a set of statements about things and their relationships; underneath these statements map to a subset of first-order logic which makes it possible to answer questions about these statements using highly-optimised reasoners. Take Wing = Although in it's early stage, a rich manual is now being written for Tawny-OWL https://github.com/phillord/take-wing http://homepages.cs.ncl.ac.uk/phillip.lord/take-wing/take_wing.html Changes === The main feature for the 1.2 release has been the incorporation of core.logic, through (ab)use of the Tawny's querying facilities. A tighter integration should be possible, having core.logic work directly over the OWL API, but this was relatively simple to implement. It is performant enough for most uses (the Gene Ontology renders to Clojure data structures in 1-2 seconds on my One other substantial change is an aggressive micro-optimisation of default-ontology and broadcast-ontology functionality. This functionality is used in many parts of Tawny-OWL, so this results in a significant performance enhancement. Full change log is available. https://github.com/phillord/tawny-owl/blob/master/docs/releases.md -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU
Re: Ontology for Somatic Mutations?
In what sense? See if we can generate the description of the karyotype from a genome sequence? Or at least compare the two? I agree that this would be interesting. At the moment, the problem that we have is the ISCN string is computationally relatively intractable. In most cases, though, the ISCN is all we have: there is no sequence, and no biological material. Phil Karen Eilbeck keilb...@genetics.utah.edu writes: Hi Phil Nice model of ISCN. We currently allow ISCN strings to be annotated in GVF to describe a genome structure. It may be interesting to validate against whole genome sequence. --K On Jul 23, 2013, at 5:24 AM, Phillip Lord wrote: To the extent that it helps to answer our use case, co-ordination might be useful; our work on the karyotype is reasonably tightly scoped, and I wish to maintain this. Phil Melissa Haendel haen...@ohsu.edumailto:haen...@ohsu.edu writes: Hi all, It would be great if we could coordinate these efforts - The genotype work we are doing that Chris Baker mentioned earlier on this thread (see http://www.unbsj.ca/sase/csas/data/ws/icbo2013/papers/ec/icbo2013_submission_60.pdf ) is already being integrated into the sequence ontology. Cheers, Melissa On Jul 22, 2013, at 10:22 AM, Suzanna Lewis s...@berkeleybop.orgmailto:s...@berkeleybop.orgmailto:s...@berkeleybop.org wrote: Check out the Sequence Ontology. It is well-established in the genomics community. http://sequenceontology.org/ On Jul 22, 2013, at 4:53 PM, Phillip Lord phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.uk wrote: We are working on a karyotype ontology which describes chromosome abnormalities. The first paper is available here which also includes links to the ontology. http://arxiv.org/abs/1305.3758 Oliver Ruebenacker cur...@gmail.commailto:cur...@gmail.commailto:cur...@gmail.com writes: Hello, Does any one know of an ontology for somatic mutations (including SNPs, chromosomal abnormalities, etc.)? Take care Oliver -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU Dr. Melissa Haendel Assistant Professor Ontology Development Group, OHSU Library http://www.ohsu.edu/library/ Department of Medical Informatics and Epidemiology Oregon Health Science University haen...@ohsu.edumailto:haen...@ohsu.edumailto:haen...@ohsu.edu skype: melissa.haendel 503-407-5970 -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU Karen Eilbeck Associate Professor Department of Biomedical Informatics, University of Utah -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU
Re: Ontology for Somatic Mutations?
To the extent that it helps to answer our use case, co-ordination might be useful; our work on the karyotype is reasonably tightly scoped, and I wish to maintain this. Phil Melissa Haendel haen...@ohsu.edu writes: Hi all, It would be great if we could coordinate these efforts - The genotype work we are doing that Chris Baker mentioned earlier on this thread (see http://www.unbsj.ca/sase/csas/data/ws/icbo2013/papers/ec/icbo2013_submission_60.pdf ) is already being integrated into the sequence ontology. Cheers, Melissa On Jul 22, 2013, at 10:22 AM, Suzanna Lewis s...@berkeleybop.orgmailto:s...@berkeleybop.org wrote: Check out the Sequence Ontology. It is well-established in the genomics community. http://sequenceontology.org/ On Jul 22, 2013, at 4:53 PM, Phillip Lord phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.uk wrote: We are working on a karyotype ontology which describes chromosome abnormalities. The first paper is available here which also includes links to the ontology. http://arxiv.org/abs/1305.3758 Oliver Ruebenacker cur...@gmail.commailto:cur...@gmail.com writes: Hello, Does any one know of an ontology for somatic mutations (including SNPs, chromosomal abnormalities, etc.)? Take care Oliver -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU Dr. Melissa Haendel Assistant Professor Ontology Development Group, OHSU Library http://www.ohsu.edu/library/ Department of Medical Informatics and Epidemiology Oregon Health Science University haen...@ohsu.edumailto:haen...@ohsu.edu skype: melissa.haendel 503-407-5970 -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU
Re: Ontology for Somatic Mutations?
We are working on a karyotype ontology which describes chromosome abnormalities. The first paper is available here which also includes links to the ontology. http://arxiv.org/abs/1305.3758 Oliver Ruebenacker cur...@gmail.com writes: Hello, Does any one know of an ontology for somatic mutations (including SNPs, chromosomal abnormalities, etc.)? Take care Oliver -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU
[ANN] tawny-owl 0.11
I'm pleased to announce the release of tawny-owl 0.11. What is it? == This package allows users to construct OWL ontologies in a fully programmatic environment, namely Clojure. This means the user can take advantage of programmatic language to automate and abstract the ontology over the development process; also, rather than requiring the creation of ontology specific development environments, a normal programming IDE can be used; finally, a human readable text format means that we can integrate with the standard tooling for versioning and distributed development. Changes === # 0.11 ## New features - facts on individual are now supported - documentation has been greatly extended - OWL API 3.4.4 A new paper on the motivation and use cases for tawny-owl is also available at http://www.russet.org.uk/blog/2366 https://github.com/phillord/tawny-owl Feedback welcome! -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU
Re: owl:sameAs - Harmful to provenance?
Oliver Ruebenacker cur...@gmail.com writes: Hello Philip, Phillip:-) Apparently, you are confusing two different cases. I talked about the same reference meaning two different things. You are talking about different references talking about the same thing. No. dc:creator means many different things. It's the same. Confusion is the enemy of understanding. Confusion is the start point for all knowledge.
Re: owl:sameAs - Harmful to provenance?
Compare all you like. RDF is just another technology; it's not going to let me do anything that I cannot do in another way. I'm interested in using it because it is there, not for any other reason. The surface syntax problem; yeah, it is and remains a pain, more some in some areas than others. Phil Alan Ruttenberg alanruttenb...@gmail.com writes: Thinking about metadata as some other category of data is usually a bad sign. I've often found it to mean, in practice, data I care less about. Phil, to make the case that RDF helps here, we would want to compare how easy it is to do significant work using the ill-represented examples you find versus raw text, versus xml, versus tab-delimited files. While there is some limited benefit to getting rid of the surface syntax problem, it's not clear how much of a problem that ever was. -Alan On Mon, Apr 8, 2013 at 1:16 PM, Bhat, Talapady N. talapady.b...@nist.govwrote: Hi, - Introduction -Dublin Core: The Dublin Core Metadata Element Set is a vocabulary of fifteen properties for use in resource description. The name Dublin is due to its origin at a 1995 invitational workshop in Dublin, Ohio; core because its elements are broad and generic, usable for describing a wide range of resources. The fifteen element Dublin Core described in this standard is part of a larger set of metadata vocabularies -- As per the introduction (given above) section of doubling core ( http://dublincore.org/documents/dces/) its focus is primarily metadata whereas the actual author names mentioned below probably need be considered as 'data'. I do not think Dublin core has really focused on building standard re-usable vocabulary for 'data'. That is the real problem. That is why we have been focusing on re-usable terms for 'data' http://www.biomedcentral.com/1471-2105/12/487 and http://xpdb.nist.gov/chemblast/pdb.pl and http://www.nature.com/nmeth/journal/v9/n7/abs/nmeth.2084.html T N Bhat -Original Message- From: Phillip Lord [mailto:phillip.l...@newcastle.ac.uk] Sent: Monday, April 08, 2013 12:53 PM To: Oliver Ruebenacker Cc: David Booth; Pat Hayes; Peter Ansell; Alan Ruttenberg; public-semweb-lifesci Subject: Re: owl:sameAs - Harmful to provenance? And it is this bit -- before we can do anything useful that is utterly wrong. Recently I have spent a lot of time look at Dublin Core creator fields. You could not believe how many different ways they are used. String literals (Phillip Lord), last-first (Lord, Phillip), with abbrevs (P. Lord), multi-author (Phillip Lord; Lindsay Marshall), with titles (Dr Phillip Lord) and so on. So, is everyone using Dublin Core wrong? It is useless till everyone uses it the same way? Emphatically no, it is not useless. Would it better if everybody did use it the same way? The answer is probably not. Names are incredibly complex, and representing them is, in turn, difficult and hard. Any specificiation which did full justice to all the different name forms in existance would be incredibly long-winded. Many people using the specification would get it wrong; or you could have a mechanism for ensuring people always used it correctly. Then I am sure that both people who ended up using this form of spec would have great fun integrating their tiny datasets. In the example, we have a number of sets of assertions which individually fulfil their creators use-cases. Then, when they are bought together, the assertions become inconsistent, telling you up front that there is work to be done. And you ask in what way is this useful? Perfection is the enemy of Good. Oliver Ruebenacker cur...@gmail.com writes: So what most people here are saying is that before we can do anything useful, we need to make sure that if two assertions use the same reference, they mean the same thing. To which you respond that you will accept assertions without assuming that same references mean same things. You will just keep them separate. There is no rule against that. But in what way is this useful? Take care Oliver On Mon, Apr 8, 2013 at 10:07 AM, David Booth da...@dbooth.org wrote: Hi Pat, On 04/04/2013 02:03 AM, Pat Hayes wrote: On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote: On 4 April 2013 11:58, David Booth da...@dbooth.org wrote: On 04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2, 2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote: On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote: If only owl:sameAs were used correctly... Well, I agree that is a problem, but don't draw the conclusion that there is something wrong with sameAs, just because people keep using it wrong. Agreed. And furthermore, don't draw the conclusion that someone has used owl:sameAs wrong just because you get garbage when you merge two graphs that individually worked just
Re: owl:sameAs - Harmful to provenance?
I think that your name is unique. There is no other string of letters which is the same as yours which is not identical to your name. Agreed, that there is not a one-to-one mapping between your name and people in existance. But this is not necessarily a problem; depends on your use case. Phil Michael Miller michael.mil...@systemsbiology.org writes: phillip, not to mention a name (like mine!) is not particularly unique. cheers, michael Michael Miller Software Engineer Institute for Systems Biology -Original Message- From: Phillip Lord [mailto:phillip.l...@newcastle.ac.uk] Sent: Monday, April 08, 2013 9:53 AM To: Oliver Ruebenacker Cc: David Booth; Pat Hayes; Peter Ansell; Alan Ruttenberg; public-semweb- lifesci Subject: Re: owl:sameAs - Harmful to provenance? And it is this bit -- before we can do anything useful that is utterly wrong. Recently I have spent a lot of time look at Dublin Core creator fields. You could not believe how many different ways they are used. String literals (Phillip Lord), last-first (Lord, Phillip), with abbrevs (P. Lord), multi-author (Phillip Lord; Lindsay Marshall), with titles (Dr Phillip Lord) and so on. So, is everyone using Dublin Core wrong? It is useless till everyone uses it the same way? Emphatically no, it is not useless. Would it better if everybody did use it the same way? The answer is probably not. Names are incredibly complex, and representing them is, in turn, difficult and hard. Any specificiation which did full justice to all the different name forms in existance would be incredibly long-winded. Many people using the specification would get it wrong; or you could have a mechanism for ensuring people always used it correctly. Then I am sure that both people who ended up using this form of spec would have great fun integrating their tiny datasets. In the example, we have a number of sets of assertions which individually fulfil their creators use-cases. Then, when they are bought together, the assertions become inconsistent, telling you up front that there is work to be done. And you ask in what way is this useful? Perfection is the enemy of Good. Oliver Ruebenacker cur...@gmail.com writes: So what most people here are saying is that before we can do anything useful, we need to make sure that if two assertions use the same reference, they mean the same thing. To which you respond that you will accept assertions without assuming that same references mean same things. You will just keep them separate. There is no rule against that. But in what way is this useful? Take care Oliver On Mon, Apr 8, 2013 at 10:07 AM, David Booth da...@dbooth.org wrote: Hi Pat, On 04/04/2013 02:03 AM, Pat Hayes wrote: On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote: On 4 April 2013 11:58, David Booth da...@dbooth.org wrote: On 04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2, 2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote: On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote: If only owl:sameAs were used correctly... Well, I agree that is a problem, but don't draw the conclusion that there is something wrong with sameAs, just because people keep using it wrong. Agreed. And furthermore, don't draw the conclusion that someone has used owl:sameAs wrong just because you get garbage when you merge two graphs that individually worked just fine. Those two graphs may have been written assuming different sets of interpretations. In that case I would certainly conclude that they have used it wrong. Have you not been reading what Pat and I have been writing? I've read lots of what you and Pat have written. And I've learned a lot from it -- particularly in learning about ambiguity from Pat. And I'm in full agreement that owl:sameAs is *often* misused. But I don't believe that getting garbage when merging two graphs that individually worked fine *necessarily* indicates that owl:sameAs was misused -- even when it appears on the surface to be causing the problem. I agree, but not with your example and your analysis of it. Here's a simple example to illustrate. Using the following prefixes throughout, for brevity: @prefix :http://example/owen/ . @prefix owl: http://www.w3.org/2002/07/**owl# http://www.w3.org/2002/07/owl# . Suppose that Owen is the URI owner of :x, :y and :z, and Owen defines them as follows: # Owen's URI definition for :x, :y and :z :x a :Something . :y a :Something . :z a :Something . That's all. That's Owen's entire definition of those URIs. Obviously this definition is ambiguous in some sense. But as we know, ambiguity is ultimately inescapable anyway, so I have merely chosen an example that makes the ambiguity obvious. As the RDF Semantics spec puts it: It is usually impossible to assert enough in any language
Re: owl:sameAs - Harmful to provenance?
Kingsley Idehen kide...@openlinksw.com writes: On 4/9/13 11:31 AM, Phillip Lord wrote: Compare all you like. RDF is just another technology; it's not going to let me do anything that I cannot do in another way. So you are questioning its unique selling points, I assume? No. I don't care. I just care whether it's useful. Who cares whether it's uniquely useful. If so, can you point us to a technology that addresses the issue of grounding logic in data -- in a manner that's totally platform independent? It's a data representation technology. Lots of things do this. Totally platform independent. I don't know what platform means these days. We want to be able to leverage logic in the process of actual data representation, access, integration, and management. I know of no technology that addresses the problem like RDF i.e., in a platform agnostic manner that echoes the essence of the Web itself. RDF is nice. It's useful. It will remain useful, at least if people are allowed to use it without being told that they are doing it all wrong. I am not attacking RDF; I am attacking the notion that everything has to be perfect, to work in every circumstance, for it to be useful at all. Phil
Re: owl:sameAs - Harmful to provenance?
And it is this bit -- before we can do anything useful that is utterly wrong. Recently I have spent a lot of time look at Dublin Core creator fields. You could not believe how many different ways they are used. String literals (Phillip Lord), last-first (Lord, Phillip), with abbrevs (P. Lord), multi-author (Phillip Lord; Lindsay Marshall), with titles (Dr Phillip Lord) and so on. So, is everyone using Dublin Core wrong? It is useless till everyone uses it the same way? Emphatically no, it is not useless. Would it better if everybody did use it the same way? The answer is probably not. Names are incredibly complex, and representing them is, in turn, difficult and hard. Any specificiation which did full justice to all the different name forms in existance would be incredibly long-winded. Many people using the specification would get it wrong; or you could have a mechanism for ensuring people always used it correctly. Then I am sure that both people who ended up using this form of spec would have great fun integrating their tiny datasets. In the example, we have a number of sets of assertions which individually fulfil their creators use-cases. Then, when they are bought together, the assertions become inconsistent, telling you up front that there is work to be done. And you ask in what way is this useful? Perfection is the enemy of Good. Oliver Ruebenacker cur...@gmail.com writes: So what most people here are saying is that before we can do anything useful, we need to make sure that if two assertions use the same reference, they mean the same thing. To which you respond that you will accept assertions without assuming that same references mean same things. You will just keep them separate. There is no rule against that. But in what way is this useful? Take care Oliver On Mon, Apr 8, 2013 at 10:07 AM, David Booth da...@dbooth.org wrote: Hi Pat, On 04/04/2013 02:03 AM, Pat Hayes wrote: On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote: On 4 April 2013 11:58, David Booth da...@dbooth.org wrote: On 04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2, 2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote: On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote: If only owl:sameAs were used correctly... Well, I agree that is a problem, but don't draw the conclusion that there is something wrong with sameAs, just because people keep using it wrong. Agreed. And furthermore, don't draw the conclusion that someone has used owl:sameAs wrong just because you get garbage when you merge two graphs that individually worked just fine. Those two graphs may have been written assuming different sets of interpretations. In that case I would certainly conclude that they have used it wrong. Have you not been reading what Pat and I have been writing? I've read lots of what you and Pat have written. And I've learned a lot from it -- particularly in learning about ambiguity from Pat. And I'm in full agreement that owl:sameAs is *often* misused. But I don't believe that getting garbage when merging two graphs that individually worked fine *necessarily* indicates that owl:sameAs was misused -- even when it appears on the surface to be causing the problem. I agree, but not with your example and your analysis of it. Here's a simple example to illustrate. Using the following prefixes throughout, for brevity: @prefix :http://example/owen/ . @prefix owl: http://www.w3.org/2002/07/**owl# http://www.w3.org/2002/07/owl# . Suppose that Owen is the URI owner of :x, :y and :z, and Owen defines them as follows: # Owen's URI definition for :x, :y and :z :x a :Something . :y a :Something . :z a :Something . That's all. That's Owen's entire definition of those URIs. Obviously this definition is ambiguous in some sense. But as we know, ambiguity is ultimately inescapable anyway, so I have merely chosen an example that makes the ambiguity obvious. As the RDF Semantics spec puts it: It is usually impossible to assert enough in any language to completely constrain the interpretations to a single possible world. Yes, but by making the ambiguity this obvious, you have rendered the example pointless. There is *no* content here *at all*, so Owen has not really published anything. This is not typical of published content, even in RDF. Typically, in fact, there is, as well as some nontrivial actual RDF content, some kind of explanation, perhaps in natural language, of what the *intended* content of the formal RDF is supposed to be. While an RDF engine cannot of course make use of such intuitive explanations, other authors of RDF can, and should, make use of it to try to ensure that they do not make assertions which would be counter to the referential intentions of the original authors. For example, the Dublin Core URIs were published with almost no formal RDF axioms, but quite elaborate natural language glosses which
Re: owl:sameAs - Harmful to provenance?
David Booth da...@dbooth.org writes: Maybe someone can see a way to avoid this dilemma. Maybe someone can figure out a way to distinguish between the essential properties that serve to identify a resource, and other inessential properties that the resource might have. If so, and the number of essential properties is finite, then indeed this problem could be avoided by requiring every URI owner to define all of the essential properties of the URI's denoted resource, or by prohibiting anyone but the URI owner from asserting any new essential properties of the resource (beyond those the URI owner had defined). Or maybe there is another way around this dilemma. Unless some way around this dilemma is found, it seems unreasonably judgemental to accuse Arthur of misusing owl:sameAs in this case, since he didn't assert anything that was inconsistent with Owen's URI definition. I think your analysis is good. My solution to avoiding the horns of the dilemma is to take a different tack entirely, and to think about the social aspects of how these graphs came to be produced. Owen has produced some data. Then Arthur, Aster and Alfred has both extended it in ways which turn out to be incompatible, and yet they all seem to be doing things that fulfil their respective use cases. So, in one sense, there is no problem here at all. In each case, Arthur, Aster and Alfred get everything to work, and everybody is happy. The problem comes when you try to integrate their work; now it breaks. So, how to avoid this? They are two key ways: the first is to say, well, okay, so now the graphs break, so lets get together and sort the problem out. There are lots of ways you could change the graphs here so that the problem goes away. The other solution is to argue that if everybody follows a standard rigidly, then this problem won't happen in the first place. The difficulty here is not, for example, understanding the set theoretic interpretation, but how to apply this to what ever it is that you are trying to model. My experience has been that the former has significant costs, that integrating post-hoc is expensive and time-consuming. However, my experience of the latter approach, is that it is highly unscalable, and results in very long and obscure philosophical debates. Essentially, with the former you pay the cost of integration as you need it; in the latter you pay the cost of integration all the time, whether you need it or not. So, are the people in your example misuing owl:sameAs? Not if they are answering the questions they need. Should they fix the problem with integration? If they need to, to get better answers. But not until then. But by that logic, Arthur would not be able to assert *anything* new about :x. I.e., Arthur would not be allowed to assert any property whose value was not already entailed by Owen's definition! And that would render RDF rather pointless. Absolutely; the whole point of integrating data is that you want to say things about knowledge that comes from other people. Otherwise, you don't have integration, you just have a bunch of triples in the same bucket. Phil
Re: Observations about facts in genomics
Yeah, I have heard this argument before. Soon as you give me an assayable and testable definition for reality, I'm right with you. Phil Jerven Bolleman m...@jerven.eu writes: On Thu, Mar 21, 2013 at 2:55 PM, Phillip Lord phillip.l...@newcastle.ac.uk wrote: This is a broken definition of good to my mind. It suggests that we should make all the distinctions that we can make, all the time. Unfortunately, this means that everyone bears the cost of the complexity all the time also. True but the other option is the current situation where we all bear the complexity of not knowing what we someone is really talking about. Leading to merging of information that should never have been merged and conclusions that are not worth the pixels they are displayed on. Sure there is a cost to ever more complex representations of information to match reality and this is not what I am advocating. I am advocating give reality a different IRI than the model.
Re: Observations about facts in genomics
No, they don't. They have a responsibilty to do what they are being paid to do (or want to achieve for their own purposes) in a rapid and efficient manner. The point of standards is to make it easier to do this in the same way as others than to not. People write URLs correctly because otherwise they don't work, not because they have a responsibility. Pat Hayes pha...@ihmc.us writes: All citizens have certain responsibilities if they are going to use a global interchange format of any kind, which is to find a way to encode their domain in that format in a way that conforms to the published rules of the format. Or if that is not possible, then at least to publish the ways in which they are failing to conform, and to ensure that readers of their data have adequate warning of the ways they are failing to conform, and what the consequences are.
Re: Observations about facts in genomics
This is a broken definition of good to my mind. It suggests that we should make all the distinctions that we can make, all the time. Unfortunately, this means that everyone bears the cost of the complexity all the time also. A good data model should be an accurate reflection of biology. But it should also be a convienient model of biology. And the distinction that you are making is relevant to only a subset of use cases. Make the distinctions you care about. Let others make the distinctions that they care about. Phil Jerven Bolleman m...@jerven.eu writes: This is fine in RDF, the important thing to separate is the concept of a Chromsome/Patient sequence and a set of observations and hypothesis about that Chromosome sequence. So instead of chromosome M you are really talking about assembly X of a set of reads R mapped via some (variant calling) processes to reference chromosome C that is also really an assembly of a different set of reads. Subtly different and not always made explicit in conversation, but for good RDF you representations you should. In RDF here you need to be careful about what you are identifying. As long as you are correct in what you identified (in this case an variant called, mapped assembly) instead of what you are discussing in english (a patients chromosome) you will end up fine. If you do this you don't need anything as exotic as frames etc... Regards, Jerven On Wed, Mar 20, 2013 at 9:23 PM, Graham Klyne graham.kl...@zoo.ox.ac.uk wrote: Hi Jeremy, On 20/03/2013 16:04, Jeremy J Carroll wrote: One of the things I am learning about genetic sequencing is this process, which is meant to tell you about the patient's DNA, is in fact somewhat problematic, resulting in facts which are disputable. It gets worse... the association between sequence fragments and genes changes over time as knowledge is improved, I understand in ways that isn't always reflected in published information. GMOD/CHADO (http://gmod.org/wiki/Introduction_to_Chado) keeps all the concepts very separate to allow for this, but the translation to RDF can get very convoluted (Al Miles did some work on a mapping, a few years ago). I also understand that there's emerging research that shows that non-coding regions, which were previously thought to be meaningless/irrelevant, do actually have relevant roles in the overall genetic machinery (something to do with regulation?). One of the many reasons I'd like RDF to have some flexibility to deal with contexts, or differing worldviews, is to allow representation of evolving information without having to make explicit all those things that researchers sometimes don't bother to make explicit (e.g. genes vs proteins, sequence vs gene, etc.). And then there all the stuff we don't yet know to make explicit. (frame problem, anyone?) #g -- On 20/03/2013 16:04, Jeremy J Carroll wrote: Pat Hayes wrote: [RDF] is intended for recording data, and most data is pretty mundane stuff about which there is not a lot of factual disagreement. One of the things I am learning about genetic sequencing is this process, which is meant to tell you about the patient's DNA, is in fact somewhat problematic, resulting in facts which are disputable. So, a data file that I am trying to get my head around at the moment contains a line like: chrM942 rs28579222 A G . . ASP;HD;OTHERKG;RSPOS=942;SAO=0;SF=0;SSR=0;VC=SNV;VP=0505000402000100;WGT=1;dbSNPBuildID=125 So far, I have understood the first five fields, as saying that in a particular position in the DNA (the 942nd base in the mitochondrial DNA, aka rs28579222), when one might have expected to see an A a sample had a G. But that last part a sample had a G is in fact open to doubt … There is a complex piece of chemistry, physics and computing that guesses that there is a G in that position. It is possible to see some of the less processed data that fed into that guess, and to see levels of confidence that the different algorithms had with the results; but it is not a slam dunk by any means. So, some more skeptical people want to be able to see the 'raw read data' prior to the decision that this is a G. Usually one would expect to see some of the raw read data agree with the G, and some disagree. Since this assertion (that this position is a G) is made with a few million similar assertions, all of which have some element of doubt - it would be highly surprising if every single call were correct: yet within the logic of RDF we probably end up asserting the truth of the whole graph … which leads us onto the dangerous path of ex contradictione quadlibet -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower
Re: ANN: Semantic University - for learning the Semantic Web (now easier to use)
Referencing and linking are not the same thing; say I want to produce a table of contents of resources, including yours and others, or an index. What would you think if you opened a book and at the front it said pg 8, pg 32, pg 53, where you expected a table of contents. If you want to know more, I'd suggest that you read https://www.cambridgesemantics.com/semantic-university/introduction-to-linked-data Rule 3 is: When someone looks up a URI, provide useful information, using the standards such as RDF* and SPARQL. Anyway, I think I have said enough here; it's your website! Phil Lee Feigenbaum l...@thefigtrees.net writes: I don't really find the use cases you suggest particularly compelling. Perhaps you could explain them in a bit more detail? Searching -- we do some degree of traditional SEO, and the lessons generally show up very well on major search engines Sorting -- I'm not sure what would be sorted? The lessons are presented in a particular order designed to help the understanding of readers who go through the material as presented Mashing up -- Can you give me an example? Referencing -- Generally speaking, we think that the URLs of the individual lessons are perfectly adequate for referencing thanks, Lee On 10/11/2012 5:24 AM, Phillip Lord wrote: I am a little surprised that you can't see use cases for adding computationally extractable metadata to your articles. Searching, sorting, mashing up, referencing and so on. RSS is a different point; ignoring it's what's new role, it happens to be a reasonable source for computational metadata where there is nothing else. Phil Lee Feigenbaum l...@thefigtrees.net writes: Thanks for the feedback. We didn't pursue an RSS feed for the site because it's intended to be relatively timeless educational content, rather than dated material. That said, I can look into adding one. Can you help me understand the use cases for using some of the other approaches you mention and what would be involved? I didn't really have any compelling use cases in mind off the top of my head to mark up these lessons. thanks, Lee On 10/10/2012 7:20 AM, Phillip Lord wrote: This is an interesting set of pages. One thing that confuses me about this web site is that, as far as I can see, it apperas to use no semantic web technology; certainly trying to mine the web pages shows no metadata describing what the document is about. We tried searching for OGP, various forms of metatags, prism, COINs and so on, using our Greycite (http://greycite.knowledgeblog.org) tool, and found nothing. We've tried visual inspection as well -- not easy as all the HTML is on one line -- and again can see nothing. Tried content negotiation for RDF, but this returns HTML. Even the normally reliable RSS feed fails because there isn't one. Phil Lee Feigenbaum l...@thefigtrees.net writes: Hi everyone, Many of you may already have come across Semantic University http://www.cambridgesemantics.com/semantic-university, but I'd like to announce it to this community. Semantic University is a free, online resource for learning Semantic Web technologies. We've gotten some great feedback over the past few months, and we feel that it's one of the most accessible ways for both technical and non-technical people to start learning about semantics and the Semantic Web. For those of you who have seen Semantic University before, we've re-organized the content into general Semantic Web Landscape content and into specific technical tracks oriented around RDF, OWL/RDFS, SPARQL, and Semantic Web Design Patterns. I hope you'll check it out as we think it's now much easier to use to learn about the Semantic Web. Semantic University currently includes over 30 lessons, and we're continually preparing new content. We're also looking for additional writers to contribute new lessons, so please contact me if you'd be interested. I'd especially like to start including content specific to particular verticals, and HCLS would be a great starting place. Please let me know if you'd be interested in contributing! Current lessons include: * An Introduction to the Semantic Web https://www.cambridgesemantics.com/semantic-university/introduction-to-the-semantic-web * Semantic Web Misconceptions https://www.cambridgesemantics.com/semantic-university/semantic-web-misconceptions * Semantic Web vs. Semantic Technologies https://www.cambridgesemantics.com/semantic-university/semantic-web-vs-semantic-technologies * RDF 101 https://www.cambridgesemantics.com/semantic-university/rdf-101 * SPARQL Nuts and Bolts https://www.cambridgesemantics.com/semantic-university/sparql-nuts-and-bolts ...and many more. Please enjoy we welcome all feedback suggestions. best, Lee -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics
Re: ANN: Semantic University - for learning the Semantic Web (now easier to use)
I am a little surprised that you can't see use cases for adding computationally extractable metadata to your articles. Searching, sorting, mashing up, referencing and so on. RSS is a different point; ignoring it's what's new role, it happens to be a reasonable source for computational metadata where there is nothing else. Phil Lee Feigenbaum l...@thefigtrees.net writes: Thanks for the feedback. We didn't pursue an RSS feed for the site because it's intended to be relatively timeless educational content, rather than dated material. That said, I can look into adding one. Can you help me understand the use cases for using some of the other approaches you mention and what would be involved? I didn't really have any compelling use cases in mind off the top of my head to mark up these lessons. thanks, Lee On 10/10/2012 7:20 AM, Phillip Lord wrote: This is an interesting set of pages. One thing that confuses me about this web site is that, as far as I can see, it apperas to use no semantic web technology; certainly trying to mine the web pages shows no metadata describing what the document is about. We tried searching for OGP, various forms of metatags, prism, COINs and so on, using our Greycite (http://greycite.knowledgeblog.org) tool, and found nothing. We've tried visual inspection as well -- not easy as all the HTML is on one line -- and again can see nothing. Tried content negotiation for RDF, but this returns HTML. Even the normally reliable RSS feed fails because there isn't one. Phil Lee Feigenbaum l...@thefigtrees.net writes: Hi everyone, Many of you may already have come across Semantic University http://www.cambridgesemantics.com/semantic-university, but I'd like to announce it to this community. Semantic University is a free, online resource for learning Semantic Web technologies. We've gotten some great feedback over the past few months, and we feel that it's one of the most accessible ways for both technical and non-technical people to start learning about semantics and the Semantic Web. For those of you who have seen Semantic University before, we've re-organized the content into general Semantic Web Landscape content and into specific technical tracks oriented around RDF, OWL/RDFS, SPARQL, and Semantic Web Design Patterns. I hope you'll check it out as we think it's now much easier to use to learn about the Semantic Web. Semantic University currently includes over 30 lessons, and we're continually preparing new content. We're also looking for additional writers to contribute new lessons, so please contact me if you'd be interested. I'd especially like to start including content specific to particular verticals, and HCLS would be a great starting place. Please let me know if you'd be interested in contributing! Current lessons include: * An Introduction to the Semantic Web https://www.cambridgesemantics.com/semantic-university/introduction-to-the-semantic-web * Semantic Web Misconceptions https://www.cambridgesemantics.com/semantic-university/semantic-web-misconceptions * Semantic Web vs. Semantic Technologies https://www.cambridgesemantics.com/semantic-university/semantic-web-vs-semantic-technologies * RDF 101 https://www.cambridgesemantics.com/semantic-university/rdf-101 * SPARQL Nuts and Bolts https://www.cambridgesemantics.com/semantic-university/sparql-nuts-and-bolts ...and many more. Please enjoy we welcome all feedback suggestions. best, Lee -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, msn: m...@russet.org.uk NE1 7RU twitter: phillord
Re: ANN: Semantic University - for learning the Semantic Web (now easier to use)
I didn't ask for semantic web technologies (other than RSS which *might* be RDF). Adding any computational metadata would, surely, be a good thing. I do not understand why people deal with the web so unseriously as a publication media. Although, curiously, you do have minimal metatags on your own website; not for search engines, it would appear, as you've robot.txt'd them away. Why did you use them if the are meaningless? Phil Eric Miller e...@squishymedia.com writes: Just chiming in quickly here -- our web dev shop wouldn't automatically include any of the semantic web technologies in a standard site template. Like Lee, we haven't seen a compelling use case to do anything beyond well-formed code. For example, Meta tags haven't been part of our code standards for years since the search engines tend to ignore them and they were prone to abuse or meaninglessness. I acknowledge that the newer semantically aware technologies might benefit from a push towards critical mass through community adoption but it is a hard sell to justify spending client dollars to implement something just on principle. A chicken and egg problem I guess. Though I'd welcome any other perspectives on this. Eric Squishymedia Web Development 503.780.1847 On Oct 11, 2012, at 2:24 AM, phillip.l...@newcastle.ac.uk (Phillip Lord) wrote: I am a little surprised that you can't see use cases for adding computationally extractable metadata to your articles. Searching, sorting, mashing up, referencing and so on. RSS is a different point; ignoring it's what's new role, it happens to be a reasonable source for computational metadata where there is nothing else. Phil Lee Feigenbaum l...@thefigtrees.net writes: Thanks for the feedback. We didn't pursue an RSS feed for the site because it's intended to be relatively timeless educational content, rather than dated material. That said, I can look into adding one. Can you help me understand the use cases for using some of the other approaches you mention and what would be involved? I didn't really have any compelling use cases in mind off the top of my head to mark up these lessons. thanks, Lee On 10/10/2012 7:20 AM, Phillip Lord wrote: This is an interesting set of pages. One thing that confuses me about this web site is that, as far as I can see, it apperas to use no semantic web technology; certainly trying to mine the web pages shows no metadata describing what the document is about. We tried searching for OGP, various forms of metatags, prism, COINs and so on, using our Greycite (http://greycite.knowledgeblog.org) tool, and found nothing. We've tried visual inspection as well -- not easy as all the HTML is on one line -- and again can see nothing. Tried content negotiation for RDF, but this returns HTML. Even the normally reliable RSS feed fails because there isn't one. Phil Lee Feigenbaum l...@thefigtrees.net writes: Hi everyone, Many of you may already have come across Semantic University http://www.cambridgesemantics.com/semantic-university, but I'd like to announce it to this community. Semantic University is a free, online resource for learning Semantic Web technologies. We've gotten some great feedback over the past few months, and we feel that it's one of the most accessible ways for both technical and non-technical people to start learning about semantics and the Semantic Web. For those of you who have seen Semantic University before, we've re-organized the content into general Semantic Web Landscape content and into specific technical tracks oriented around RDF, OWL/RDFS, SPARQL, and Semantic Web Design Patterns. I hope you'll check it out as we think it's now much easier to use to learn about the Semantic Web. Semantic University currently includes over 30 lessons, and we're continually preparing new content. We're also looking for additional writers to contribute new lessons, so please contact me if you'd be interested. I'd especially like to start including content specific to particular verticals, and HCLS would be a great starting place. Please let me know if you'd be interested in contributing! Current lessons include: * An Introduction to the Semantic Web https://www.cambridgesemantics.com/semantic-university/introduction-to-the-semantic-web * Semantic Web Misconceptions https://www.cambridgesemantics.com/semantic-university/semantic-web-misconceptions * Semantic Web vs. Semantic Technologies https://www.cambridgesemantics.com/semantic-university/semantic-web-vs-semantic-technologies * RDF 101 https://www.cambridgesemantics.com/semantic-university/rdf-101 * SPARQL Nuts and Bolts https://www.cambridgesemantics.com/semantic-university/sparql-nuts-and-bolts ...and many more. Please enjoy we welcome all feedback suggestions. best, Lee -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l
Re: ANN: Semantic University - for learning the Semantic Web (now easier to use)
This is an interesting set of pages. One thing that confuses me about this web site is that, as far as I can see, it apperas to use no semantic web technology; certainly trying to mine the web pages shows no metadata describing what the document is about. We tried searching for OGP, various forms of metatags, prism, COINs and so on, using our Greycite (http://greycite.knowledgeblog.org) tool, and found nothing. We've tried visual inspection as well -- not easy as all the HTML is on one line -- and again can see nothing. Tried content negotiation for RDF, but this returns HTML. Even the normally reliable RSS feed fails because there isn't one. Phil Lee Feigenbaum l...@thefigtrees.net writes: Hi everyone, Many of you may already have come across Semantic University http://www.cambridgesemantics.com/semantic-university, but I'd like to announce it to this community. Semantic University is a free, online resource for learning Semantic Web technologies. We've gotten some great feedback over the past few months, and we feel that it's one of the most accessible ways for both technical and non-technical people to start learning about semantics and the Semantic Web. For those of you who have seen Semantic University before, we've re-organized the content into general Semantic Web Landscape content and into specific technical tracks oriented around RDF, OWL/RDFS, SPARQL, and Semantic Web Design Patterns. I hope you'll check it out as we think it's now much easier to use to learn about the Semantic Web. Semantic University currently includes over 30 lessons, and we're continually preparing new content. We're also looking for additional writers to contribute new lessons, so please contact me if you'd be interested. I'd especially like to start including content specific to particular verticals, and HCLS would be a great starting place. Please let me know if you'd be interested in contributing! Current lessons include: * An Introduction to the Semantic Web https://www.cambridgesemantics.com/semantic-university/introduction-to-the-semantic-web * Semantic Web Misconceptions https://www.cambridgesemantics.com/semantic-university/semantic-web-misconceptions * Semantic Web vs. Semantic Technologies https://www.cambridgesemantics.com/semantic-university/semantic-web-vs-semantic-technologies * RDF 101 https://www.cambridgesemantics.com/semantic-university/rdf-101 * SPARQL Nuts and Bolts https://www.cambridgesemantics.com/semantic-university/sparql-nuts-and-bolts ...and many more. Please enjoy we welcome all feedback suggestions. best, Lee -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, msn: m...@russet.org.uk NE1 7RU twitter: phillord
Re: HPO and Gene Ontology Licenses
Alan Ruttenberg alanruttenb...@gmail.com writes: As you know, we and others have demonstrated that alternative representations and reformulation of knowledge is desirable for certain kinds of scientific inquiry. Sorry, I'm unaware of such demonstration. Could you cite some references? http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0012258 A few examples of where multiple representations of the same knowledge have been used for good reasons: - multiple syntaxes for RDF - multiple syntaxes for OWL - two APIs for XML (DOM and SAX). - multiple computer languages which are reducable to lambda calculus - lambda calculus and a Turing Machine - continued use of Newtonian mechanics, although its an approximation of relativistic mechanics - multiple statisical techniques for expression of central tendancy - PDFs are still better for reading in the bath than HTML And so on. Any model is a compromise between accuracy, usability, convenience and so on. Sometimes having more than one compromise is a better solution than trying to shoe-horn everything into one bucket. This is a compromise too. Phil
Re: ismb sig call for proposals is online
It somewhat overlaps with bio-ontologies, though. If people are keen to do this, I'd suggest talking to the bio-ontologies organisers (Nigam is best point of contact) to see whether something can be done in common. Phil Andrea Splendiani andrea.splendi...@bbsrc.ac.uk writes: A SIG on Semantic Web applications in Life Sciences ? That would be a good thing. I would vote for 1 day, prior to ismb and not overlapping with bio- ontologies, and with a call for submission. You can list swatls as a related event with attendance (2008) of around 80pp. Let me know if I can help. ciao, Andrea P.S.: I'll be at ISWC, and you at OWLed, but I'll arrive on the 23rd... On 20 Oct 2009, at 19:09, Joanne Luciano wrote: let's propose one for ISMB... Begin forwarded message: From: Hershel Safer hsa...@alum.mit.edu Date: October 20, 2009 1:04:44 PM EDT To: Fran Lewitter lewit...@wi.mit.edu, Rafi Najmanovich rafael.najmanov...@ebi.ac.uk , Iddo Friedberg ido...@gmail.com, Eduardo Eyras eduardo.ey...@upf.edu , E Eyras eey...@imim.es, Hagit Shatkay shat...@cs.queensu.ca, Nigam Shah ni...@stanford.edu, Vitor Martins dos Santos v...@helmholtz-hzi.de, Vitor Martins dos Santos vdsm...@gmail.com, Kam Dahlquist kdahlqu...@lmu.edu, Michael Brudno bru...@gmail.com, Francisco M De La Vega francisco.delav...@appliedbiosystems.com , Laura Elnitski elnit...@mail.nih.gov, James Taylor james.tay...@emory.edu , Anton Nekrutenko an...@bx.psu.edu, Dawn Field dfi...@ceh.ac.uk, Phil Lord phillip.l...@newcastle.ac.uk, Susanna-Assunta Sansone sans...@ebi.ac.uk, Susie Stephens susie.steph...@gmail.com , Larisa Soldatova l...@aber.ac.uk, Jens Stoye st...@techfak.uni-bielefeld.de , Dirk Holste hol...@imp.ac.at, Lonnie Welch we...@ohio.edu, Joanne Luciano jluci...@genetics.med.harvard.edu, Christian Blaschke blasc...@bioalma.com, Alfonso Valencia valen...@cnio.es, Lynette Hirschman lyne...@mitre.org, Scott Markel smar...@accelrys.com, James Procter j.proc...@dundee.ac.uk, Sean O'Donoghue sean.odonog...@embl.de, Jean-Christophe Nebel j.ne...@kingston.ac.uk , Steffen Moller moel...@inb.uni-luebeck.de, Paul Flicek fli...@ebi.ac.uk Cc: Steven Leard ste...@marketwhys.ca Subject: ismb sig call for proposals is online Hi everybody, I want to let you know that the ISMB SIG call for proposals is now online at http://www.iscb.org/ismb2010-submission-details/ismb2010-special-interest-groups . The CFP is largely the same as last year's; I expect that it will be updated in a few days to include the names of the committee members. The submission deadline is Dec 1st. I look forward to another great collection of SIG meetings! Thanks, Hershel -- Hershel Safer e: hsa...@alum.mit.edu | m: +972-54-463-1977 | skype: hsafer --- Andrea Splendiani Senior Bioinformatics Scientist Rothamsted Research, Harpenden, UK andrea.splendi...@bbsrc.ac.uk +44(0)1582 763133 ext 2004 -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, msn: m...@russet.org.uk NE1 7RU
Re: Any meeting at ISMB ?
I'd also be happy to advertise a meeting at bio-ontologies, where they will be quite a few people with SW interests. Phil Hilmar Lapp hl...@duke.edu writes: On Jun 18, 2009, at 12:36 PM, eric neumann wrote: I'm pretty sure BOFs are open to ISMB registered folks only. That does not prevent us from having additional meet ups. That depends on where you hold them ... BTW Alan Ruttenberg will be giving a keynote address at BOSC, and as a result some of the BOSC attendees might be interested in a life sciences semweb BOF, so holding it one of the two days of BOSC is maybe worth considering. -hilmar -- === : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : === -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, msn: m...@russet.org.uk NE1 7RU
Bio-Ontologies programme released
** Call for Participation Bio-Ontologies: Knowledge in Biology provides a forum for discussion of the latest and most cutting-edge research in ontologies and more generally the organisation, presentation and dissemination of knowledge in biology. We are pleased to announce the programme for Bio-Ontologies 2009: Knowledge in Biology, at SIG at Intelligent Systems for Molecular Biology, with papers and posters on a wide range of topics. ** Programme This year's keynote speaker will by Barend Mons (http://en.wikipedia.org/wiki/Barend_Mons). This year's panel will be - Barend Mons - Andrew Su - Dawn Field The full programme is now available at: http://bio-ontologies.org.uk/programme.html *** Organisers Phillip Lord, Newcastle University Susanna-Assunta Sansone, EBI Nigam Shah, Stanford Susie Stephens, Eli Lilly Larisa Soldatova, University of Wales, Aberystwyth ** Registration All registration will be handled by ISCB. Further details are available http://www.iscb.org/ismbeccb2009/registration.php ** Bio-Ontologies 2008 Papers from last years SIG have now been published in BMC Bioinformatics. http://www.biomedcentral.com/1471-2105/10?issue=S5 -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, msn: m...@russet.org.uk NE1 7RU
CFP: Bio-Ontologies
***DEADLINE FRIDAY FOR SUBMISSIONS*** ** Call for Papers Submissions are now invited Bio-Ontologies 2009: Knowledge in Biology, a SIG at Intelligent Systems for Molecular Biology 2009. *** Key Dates - Submissions Due: April 10th (Friday) - Notifications: May 1st (Friday) - Final Version Due: May 8th (Friday) - Workshop: June 28th (Sunday) *** Introduction Bio-Ontologies: Knowledge in Biology provides a forum for discussion of the latest and most cutting-edge research in ontologies and more generally the organisation, presentation and dissemination of knowledge in biology. It has existed as a SIG at ISMB (http://www.iscb.org/ismbeccb2009) for 11 years now, making it one of the longest running. We are interested in any formal or informal approach to organising, presenting and disseminating knowledge in biology. We invite submissions on a wide range of topics including, but not limited to: - Semantic and/or Scientific Wikis. - Multimedia Blogs - Folksonomies - Tag Clouds - Collaborative Curation Platforms - Collaborative Ontology Authoring and Peer-Review Mechanisms - Biological Applications of Ontologies - Reports on Newly Developed or Existing Bio-Ontologies - Tools for Developing Ontologies - Use of Ontologies in Data Communication Standards - Use of Semantic Web technologies in Bioinformatics - Implications of Bio-Ontologies or the Semantic Web for Drug Discovery - Research in Ontology Languages and its Effect on Bio-Ontologies ** Programme This year's keynote speaker will by Barend Mons (http://en.wikipedia.org/wiki/Barend_Mons). This year's panel will be - Barend Mons - Andrew Su - Dawn Field *** Submissions Submissions are now open and can be submitted through easychair (http://www.easychair.org/conferences/?conf=bioontologies2009). *** Instructions to Authors We are inviting two types of submissions. - Short papers, up to 4 pages. - Poster abstracts, up to 1/2 page. Following review, successful papers will be presented at the Bio-Ontologies SIG. Poster abstracts will be provided poster space and time will be allocated during the day for at least one poster session. Unsuccessful papers will automatically be considered for poster presentation; there is no need to submit both on the same topic. *** Organisers Phillip Lord, Newcastle University Susanna-Assunta Sansone, EBI Nigam Shah, Stanford Susie Stephens, Eli Lilly Larisa Soldatova, University of Wales, Aberystwyth *** Programme Committee The programme committee, organised alphabetically is: Michael Bada, University of Colorado Denver Olivier Bodenreider, National Library Medicine Kei Cheung,Yale Center for Medical Informatics Paolo Ciccarese, Harvard Sudeshna Das, Harvard Michel Dumontier, Carleton University Wacek Kusnierczyk, Norwegian University of Science and Technology Cliff Joslyn, Pacific Northwest National Laboratory Midori Harris, European Bioinformatics Institute James Malone, European Bioinformatics Institute Robin McEntire,Independent Consultant Parsa Mirhaji, University of Texas David Newman, ECS, University of Southampton Chimezie Ogbuji, The Cleveland Clinic Foundation Alexandre Passant, DERI Alan Ruttenberg, Science Commons Phillippe Rocca-Serra, European Bioinformatics Institute Matthias Samwald, DERI Robert Stevens,University of Manchester Yimin Wang,Eli Lilly Mark Wilkinson,Medical Genetics, U. of British Columbia Jenna Zhou,Eli Lilly and the conference organisers. *** Templates Submission templates are available from the website (http://bio-ontologies.org.uk).
Re: Is OWL useful at all for Quantitative Science?
Matthias Samwald samw...@gmx.at writes: I have tried to come up with a simple example. Feel free to come up with a simpler one: Express in correct OWL: Washington DC is further away from Boston than New York City Use case: I want to fly with my helicopter from Boston to either DC or NYC, whichever is closer. Why should this be hard? If I take your example by word and I am free to come up with arbitrary OWL DL, we could simply use an n-ary design pattern to SAY it in OWL. E.g., create a class is farther away than, with three properties reference place, nearer place, place that is farther away -- and create an instance accordingly. Problem solved. I think I would say problem represented rather than solved. This is a good thing, of course, and does represent a way in which you can do quantitative science where OWL is part of the solution. Katy Wolstencroft did the same thing in this paper doi:10.1093/bioinformatics/btl208 In this case, the results of a quantitative analysis (how much similarity) were translated into statements in OWL, then we applied reasoning over the top. At Newcastle, Keith Flanagan did a similar thing looking for genomic rearrangements, which you can read about here: http://homepages.cs.ncl.ac.uk/phillip.lord/download/publications/iswc2005_bayesian_poster.pdf In this case, we were using OWL to help plug resources into a Bayesian stats engine, and later interpret the results. Bottom line with all of these, is that can interact between OWL and quantitative results. In each of these cases, the degree of interaction between the numerical reasoning and logical reasoning was through a fairly thin pipe. But I guess what we really would want to do is to describe each city with geo-tags (latitude and longitude). Then we can use SPARQL to query for cities and calculate their distance from Boston. Didn't know you could do that; is the mathematics integrated into SPARQL or are you doing some kind of call out. The reason that I ask is that ultimately, your question is which is closer as the helicopter flies which you can only really get from a database, what with nofly zones and the like. Phil
Re: blog: semantic dissonance in uniprot
Oliver Ruebenacker cur...@gmail.com writes: Besides, how do we know it's wrong? Two species can have the same protein for different functions, right? Depends how you define same. This is the crux of the problem. Phil
Re: Less strong equivalences
Pat Hayes pha...@ihmc.us writes: From your descriptions, I can't tell which one would best handle the following situation: Object 1 refers to exactly the same molecule (exemplar) as object 2 refers to That sure sounds like sameAs, applied to molecules. Why isn't sameAs good enough here? What goes wrong? I can think of very few occasions when we want to talk about a molecule; we need to talk about classes of molecules. We can consider this as problematic even with a very simple example. Let's assume we have two databases with information about Carbon. Do we use sameAs to describe the atoms that they are talking about? Maybe, but what happens if one is talking about the structure of Carbon and it's location in the periodic table, while the other is talking about Carbon with the isotopic mix that we have in living organisms on earth? In biology, we have the same problem. Is porcine insulin the same as human insulin? Is real human insulin the same as recombinant human insulin? Well, the answer to all of these is no, even though most biologists will tell you that real and recombinant insulin are the same because they have the same primary sequence; a medic will tell you otherwise, because they have different effects. Why? Don't know. If you make the distinctions that you might need some of the time, all of the time, then you are going to end up with a very complicated model. Hence the evolutionary biologist says all the insulins are the same. The medic says that they are different. And neither of them care about different types of carbon (unless they are C14-dating). I don't think that there is a generic solution here which is not too complicated to use. The only solution (which is too complicated) I can think of is to do what we do when we have this problem in programming; you use a pluggable notion of equality, by using some sort of comparitor function or object. I don't think that this is an issue for OWL myself; I think it's something we may need to build on top of OWL. Phil
Re: blog: semantic dissonance in uniprot
Oliver Ruebenacker cur...@gmail.com writes: Hello Philip, All, On Wed, Mar 25, 2009 at 1:05 PM, Phillip Lord phillip.l...@newcastle.ac.uk wrote: My own feeling is that it's biology which wove the web; we're just caught in the middle. What role for the web and semantics? Well, I think we need a coordinated, controlled and defined way of expressing our mutual confusion. I'd love to have a clear definition of gene (or protein). In it's absence, a good way of expressing err... is probably the best we can do. I don't know whether the BioPAX Level 2 definition of protein is the most useful one, but at least it sounds clear to me: protein = anything containing exactly one polypeptide chain Clear enough? So insulin is not a protein, wheras a dipeptide is? Besides which, the issue being discussed here is one of equality. When are two proteins the same protein? Phil
Re: blog: semantic dissonance in uniprot
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no writes: I don't know whether the BioPAX Level 2 definition of protein is the most useful one, but at least it sounds clear to me: protein = anything containing exactly one polypeptide chain Clear enough? So insulin is not a protein, wheras a dipeptide is? indeed; insulin is a protein complex, and a dipeptide, following this and other similar definitions, is a protein. Insulin is two polypeptide changes so following this definition is not a protein. Phil
Re: Less strong equivalences
Pat Hayes pha...@ihmc.us writes: We can consider this as problematic even with a very simple example. Let's assume we have two databases with information about Carbon. meaning, I presume, the element with atomic number 14. I was thinking of the carbon with atomic number 6. Maybe, but what happens if one is talking about the structure of Carbon and it's location in the periodic table, while the other is talking about Carbon with the isotopic mix that we have in living organisms on earth? So what? They can be saying different things about the same element. Any isotopic mix of carbon is still carbon. Different isotopic mixes have different properties. Atomic masses, melting points and so on. In biology, we have the same problem. Is porcine insulin the same as human insulin? Is real human insulin the same as recombinant human insulin? Well, the answer to all of these is no Fine, you just answered the basic ontological question. , even though most biologists will tell you that real and recombinant insulin are the same because they have the same primary sequence; a medic will tell you otherwise, because they have different effects. Why? Don't know. A deep question, but not a killer for ontology use. It's not a deep question, just one to which we don't have an answer. If you make the distinctions that you might need some of the time, all of the time, then you are going to end up with a very complicated model. Yes, you no doubt are. Tough. Its a complicated world. Yes. And on of those complications is that we have to engineer for usability as well as accuracy. Formal ontologies are often, perhaps always, more complicated than the informal 'knowledge' they set out to formalize. They are obliged to make finer, more persnickety, distinctions between things. Hence the evolutionary biologist says all the insulins are the same. I don't care what the anyone says, that is wrong. They are indistinguishable for certain purposes, but if anyone can distinguish them at all, they are not the _same_. I think that position is defensible, but unusable. All these examples can be handled by making fussy distinctions between kinds of thing at different granularities: carbon molecules, carbon isotopes, carbon the element; and then having mappings between them. I don't know much about insulin, but it sounds from the above that the same trick would work. It is tedious and hair-splitting to set this up, but once in place its fairly easy to use: you just choose the terminology corresponding to the 'level' you wish to be talk ing about. sameAs works OK at each level, but you can't be careless in using it across levels. If this makes you want to groan, I'm sorry. But ontology engineering is rather like programming. Actually, I quite like programming. I also know how to split things out in the way you describe. It requires an unusual attention to detail and a willingness to write a lot of boring stuff, because its for computers to use, and they are as dumb as dirt and have to have every little thing explained to them carefully. And yup, its complicated. Until AI succeeds, it will always be complicated. I'd quite enjoy it if you could patronise me a little more please. The only solution (which is too complicated) I can think of is to do what we do when we have this problem in programming; you use a pluggable notion of equality, by using some sort of comparitor function or object. I don't think that this is an issue for OWL myself; I think it's something we may need to build on top of OWL. It belongs in your ontology for carbon and insulin, not in OWL. Is that not what my last sentance says? Phil
Re: blog: semantic dissonance in uniprot
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no writes: So insulin is not a protein, wheras a dipeptide is? indeed; insulin is a protein complex, and a dipeptide, following this and other similar definitions, is a protein. Insulin is two polypeptide changes so following this definition is not a protein. that's what i was saying: that it is a protein complex, specifically, an aggregate of two polypeptide chains. My apologies, Wacek. I misread your reply. I thought you were disaggreing with my interpretation of the definition, which you were not. it may sound revolutionary to you that insulin is not a protein, since insulin is typically called a 'protein'. but, provided one accepts a definition like the one above, there is nothing wrong in saying that insulin is not a protein. I agree with all of this. As it happens, I would say that insulin is a protein (and not a complex) because it's disulphide bonded; so it's a single molecule, but has two polypeptide chains. I'd tweak the definition. But as you say, if we all agreed on the definition, then either way, the end result would be clear. Phil
Re: blog: semantic dissonance in uniprot
Michel_Dumontier michel_dumont...@carleton.ca writes: And I'm trying to explain that there is no pragmatic reason to make explicit the distinction between a biomolecule (and what we know about it) and a database record (and what we know about the biomolecule) unless they are actually different. It just complicates things in a wholly unnecessary way. I've given a clear example. Where two databases exist, with two records, which appear to be referring to the same (class of) molecules. The problem remains, however, that we have no clear and unambiguous way of defining what we by the same molecule. So, we refer to a database which brings along with it an (often ad hoc) definition of what the same molecule means. We could, of course, produce a resource which gives identifiers to, say, all the classes of proteins in the world. But this would not solve the problem; it would just introduce yet another resource and another methodology for defining what we mean by an individual protein. If I remember correctly the original post that started this of Ben has it about right. We need some tags which say these two database records are about the same protein, well, sort of, at least in this case, for the purposes of what I am doing. This argument reminds me of when the genome sequences were being completed and people were arguing about how many genes there are in humans. Different groups had different pipelines and came up with different answers; ultimately, you had to conclude that there were all pretty close and that working out which was best was nearly impossible in the absence of an exact answer, an exact definition of a gene. We don't have one; let's get over it and deal with this as is. Phil
Re: blog: semantic dissonance in uniprot
Oliver Ruebenacker cur...@gmail.com writes: 2009/3/23 Michel_Dumontier michel_dumont...@carleton.ca: I do not think this would be a wise simplification. This is only a simplification from one perspective: because it avoids having to mint and maintain pairs of URIs instead of a single URI. But the downstream cost is that it creates an ambiguity (or URI collision) http://www.w3.org/TR/webarch/#URI-collision that may cause trouble and be difficult to untangle later as the data is used in more and more ways. For example, if any of the same predicates need to be used on both the record and the molecular entity, they will become hopelessly confused. Also, if disjointness assertions are included then this overloading may cause logical contraditions. Can any one name a real world example of where confusion between an entity and its record was issue? Yes, sure. All proteins have a Uniprot ID (conflating protein and uniprot records). Then we integrate this with drugbank; this represents many things including proteins which are not in Uniprot, or represents several proteins where Uniprot has one. Consider insulin for instance. We now have a problem because not all proteins have a Uniprot ID. The flip side is that if you always say Protein Record -- contains knowledge about -- protein it's much more complicated. You are making your data model more difficult to work with all of the time, to cope with edge cases which occur only some of the time. There's no way around this; either way it's a compromise and what is good in one context may not be good in another. Phil
Re: blog: semantic dissonance in uniprot
Oliver Ruebenacker cur...@gmail.com writes: Is it possible that referring to records instead of things is not the result of confusion, but rather of cost-benefit considerations - that records are cheap and identification is costly and open-ended? What is it that can not be achieved by having better records instead? I never said confusing, I said conflating. I also gave a reason why this conflation was, at times, a good thing. And what does it take to identify something? We may have thought we know what a couch is, until we realize that we have no consensus over whether the pillows are part of the couch or not, and that it would be more accurate to distinguish between bare couches (without pillows) and fully featured couches (with pillows). How far are we going to go? I don't know, but I do think that we have a reasonable handle on the engineering decisions that we need to make; the problem is that these are application dependent. This is a problem if you want to integrate data for a purpose that is was not originally intended. My own feeling is taht identifying the underlying biology is attractive, but not that plausible, because we have no good way of understanding identity at this level without a record. So, when talking about a proteins, how do we know when we have one and when we have two? It's not obvious. But uniprot have a mechanism for making this judgement. So when refereing to a uniprot record we mostly mean the record, and the extensional set of proteins that are defined by it. Phil
CFP: Bio-Ontologies: Knowledge in Biology 2009
** Call for Papers Submissions are now invited Bio-Ontologies 2009: Knowledge in Biology, a SIG at Intelligent Systems for Molecular Biology 2009. *** Key Dates - Submissions Due: April 10th (Friday) - Notifications: May 1st (Friday) - Final Version Due: May 8th (Friday) - Workshop: June 28th (Sunday) *** Introduction Bio-Ontologies: Knowledge in Biology provides a forum for discussion of the latest and most cutting-edge research in ontologies and more generally the organisation, presentation and dissemination of knowledge in biology. It has existed as a SIG at ISMB (http://www.iscb.org/ismbeccb2009) for 11 years now, making it one of the longest running. We are interested in any formal or informal approach to organising, presenting and disseminating knowledge in biology. We invite submissions on a wide range of topics including, but not limited to: - Semantic and/or Scientific Wikis. - Multimedia Blogs - Folksonomies - Tag Clouds - Collaborative Curation Platforms - Collaborative Ontology Authoring and Peer-Review Mechanisms - Biological Applications of Ontologies - Reports on Newly Developed or Existing Bio-Ontologies - Tools for Developing Ontologies - Use of Ontologies in Data Communication Standards - Use of Semantic Web technologies in Bioinformatics - Implications of Bio-Ontologies or the Semantic Web for Drug Discovery - Research in Ontology Languages and its Effect on Bio-Ontologies ** Programme This year's keynote speaker will by Barend Mons (http://en.wikipedia.org/wiki/Barend_Mons). *** Submissions Submissions are now open and can be submitted through easychair (http://www.easychair.org/conferences/?conf=bioontologies2009). *** Instructions to Authors We are inviting two types of submissions. - Short papers, up to 4 pages. - Poster abstracts, up to 1/2 page. Following review, successful papers will be presented at the Bio-Ontologies SIG. Poster abstracts will be provided poster space and time will be allocated during the day for at least one poster session. Unsuccessful papers will automatically be considered for poster presentation; there is no need to submit both on the same topic. *** Organisers Phillip Lord, Newcastle University Susanna-Assunta Sansone, EBI Nigam Shah, Stanford Susie Stephens, Eli Lilly Larisa Soldatova, University of Wales, Aberystwyth *** Programme Committee The programme committee, organised alphabetically is: Michael Bada, University of Colorado Denver Olivier Bodenreider, National Library Medicine Kei Cheung,Yale Center for Medical Informatics Paolo Ciccarese, Harvard Sudeshna Das, Harvard Michel Dumontier, Carleton University Wacek Kusnierczyk, Norwegian University of Science and Technology Cliff Joslyn, Pacific Northwest National Laboratory Midori Harris, European Bioinformatics Institute James Malone, European Bioinformatics Institute Robin McEntire,Independent Consultant Parsa Mirhaji, University of Texas David Newman, ECS, University of Southampton Chimezie Ogbuji, The Cleveland Clinic Foundation Alexandre Passant, DERI Alan Ruttenberg, Science Commons Phillippe Rocca-Serra, European Bioinformatics Institute Matthias Samwald, DERI Robert Stevens,University of Manchester Yimin Wang,Eli Lilly Mark Wilkinson,Medical Genetics, U. of British Columbia Jenna Zhou,Eli Lilly and the conference organisers. *** Templates Submission templates are available from the website (http://bio-ontologies.org.uk).
CFP Bio-Ontologies
** Call for Papers Submissions are now invited Bio-Ontologies 2009: Knowledge in Biology, a SIG at Intelligent Systems for Molecular Biology 2009. *** Key Dates - Submissions Due: April 10th (Friday) - Notifications: May 1st (Friday) - Final Version Due: May 8th (Friday) - Workshop: June 28th (Sunday) *** Introduction Bio-Ontologies: Knowledge in Biology provides a forum for discussion of the latest and most cutting-edge research in ontologies and more generally the organisation, presentation and dissemination of knowledge in biology. It has existed as a SIG at ISMB (http://www.iscb.org/ismbeccb2009) for 11 years now, making it one of the longest running. We are interested in any formal or informal approach to organising, presenting and disseminating knowledge in biology. We invite submissions on a wide range of topics including, but not limited to: - Semantic and/or Scientific Wikis. - Multimedia Blogs - Folksonomies - Tag Clouds - Collaborative Curation Platforms - Collaborative Ontology Authoring and Peer-Review Mechanisms - Biological Applications of Ontologies - Reports on Newly Developed or Existing Bio-Ontologies - Tools for Developing Ontologies - Use of Ontologies in Data Communication Standards - Use of Semantic Web technologies in Bioinformatics - Implications of Bio-Ontologies or the Semantic Web for Drug Discovery - Research in Ontology Languages and its Effect on Bio-Ontologies ** Programme This year's keynote speaker will by Barend Mons (http://en.wikipedia.org/wiki/Barend_Mons). *** Submissions Submissions are now open and can be submitted through easychair (http://www.easychair.org/conferences/?conf=bioontologies2009). *** Instructions to Authors We are inviting two types of submissions. - Short papers, up to 4 pages. - Poster abstracts, up to 1/2 page. Following review, successful papers will be presented at the Bio-Ontologies SIG. Poster abstracts will be provided poster space and time will be allocated during the day for at least one poster session. Unsuccessful papers will automatically be considered for poster presentation; there is no need to submit both on the same topic. *** Organisers Phillip Lord, Newcastle University Susanna-Assunta Sansone, EBI Nigam Shah, Stanford Susie Stephens, Eli Lilly Larisa Soldatova, University of Wales, Aberystwyth *** Programme Committee The programme committee, organised alphabetically is: Michael Bada, University of Colorado Denver Olivier Bodenreider, National Library Medicine Kei Cheung,Yale Center for Medical Informatics Paolo Ciccarese, Harvard Sudeshna Das, Harvard Michel Dumontier, Carleton University Wacek Kusnierczyk, Norwegian University of Science and Technology Cliff Joslyn, Pacific Northwest National Laboratory Midori Harris, European Bioinformatics Institute James Malone, European Bioinformatics Institute Robin McEntire,Independent Consultant Parsa Mirhaji, University of Texas David Newman, ECS, University of Southampton Chimezie Ogbuji, The Cleveland Clinic Foundation Alan Passant, DERI Alan Ruttenberg, Science Commons Phillippe Rocca-Serra, European Bioinformatics Institute Matthias Samwald, DERI Robert Stevens,University of Manchester Yimin Wang,Eli Lilly Mark Wilkinson,Medical Genetics, U. of British Columbia Jenna Zhou,Eli Lilly and the conference organisers. *** Templates Submission templates are available from the website (http://bio-ontologies.org.uk).
Re: Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges
Carole I don't confuse the concepts, although I sometimes get the names mixed up. In this case, uploading a workflow (taverna or otherwise) is not going to guarantee either. I would not expect the workflow that you gave me last year would necessarily either run now, nor give me the same results for the same input. Of course, this is true in general for any computational artifact; in the case of something like Java (with it's forwardly compatibility) if it doesn't, then this defined to be a bug. In the case of other languages. In the case of workflows, I guess, we have to take the W3C line on 404 and say it's a feature not a bug. Not that this means that I think that submissions of workflows is a bad idea. I just think that they are going to be affected by the ravages of time even more quickly than raw data is. Phil Carole == Carole Goble [EMAIL PROTECTED] writes: Carole Phil Carole yes - do not confuse Reproducibility with Repeatability or Carole Reusability Carole Carole Carole Carole Goble University of Manchester. UK KC == Kei Cheung [EMAIL PROTECTED] writes: KC Peter Ansell wrote: Wiki's explicitly allow for a permanent link to a particular version of something. Hopefully an implementation of a wiki-like workflow editor online, will have similar characteristics so that you can still use a particular version to reproduce a past result if you need to, provided the web services still exist and haven't changed their interface ;-) It would also be nice to be able to get corrected versions via the wiki mechanism though and that would suit the Web 2.0 way, as opposed to publications to which corrections are hard to make. KC If some journals are requiring raw data (e.g., microarray data) to be KC submitted to a public data repository, I wonder if workflows that are KC used to analyze the data should also be submitted to a public workflow KC repository. It's a nice idea but doesn't quite allow the same level of repeatability. Most taverna workflows need updating periodically, as the services go offline or change their interfaces. Even if they don't, they return different results as the implementation changes. Ultimately, you need to store more than the workflow to allow any degree of repeatability. Still, it would be a good step forward which is no bad thing. Phil -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Claremont Tower Room 909, skype: russet_apples Newcastle University, msn: [EMAIL PROTECTED] NE1 7RU
Re: Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges
Carole == Carole Goble [EMAIL PROTECTED] writes: Carole Phil Carole er which bit of I agree with you don't you get? :-) :-) Carole I agree with you! The bit after it:-) We are in furious agreement anyway, which is the main thing. Phil
Re: Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges
KC == Kei Cheung [EMAIL PROTECTED] writes: KC Peter Ansell wrote: Wiki's explicitly allow for a permanent link to a particular version of something. Hopefully an implementation of a wiki-like workflow editor online, will have similar characteristics so that you can still use a particular version to reproduce a past result if you need to, provided the web services still exist and haven't changed their interface ;-) It would also be nice to be able to get corrected versions via the wiki mechanism though and that would suit the Web 2.0 way, as opposed to publications to which corrections are hard to make. KC If some journals are requiring raw data (e.g., microarray data) to be KC submitted to a public data repository, I wonder if workflows that are KC used to analyze the data should also be submitted to a public workflow KC repository. It's a nice idea but doesn't quite allow the same level of repeatability. Most taverna workflows need updating periodically, as the services go offline or change their interfaces. Even if they don't, they return different results as the implementation changes. Ultimately, you need to store more than the workflow to allow any degree of repeatability. Still, it would be a good step forward which is no bad thing. Phil
Re: The W3C mailing lists will be limited to interest group participants.
MS == Matthias Samwald [EMAIL PROTECTED] writes: MS Jonathan wrote: The W3C mailing lists will be limited to interest group participants. You mean public-semweb-lifesci@w3.org, for example? MS According to the last conference call, this might also apply to MS this mailing list. How many people are subscribed to this mailing MS list at the moment, and how many of these will be 'kicked out' MS when the membership policy is enforced? It also depends on whether limited means reading or posting or both. I've been a highly active lurker (erm...) on this list for years and find it very useful for this. I wouldn't read it through public archives. Email or bust. Phil
Re: [semweb-lifesci]
TG == Ted Guild [EMAIL PROTECTED] writes: TG I am surprised and sorry that anyone found the page [1] explaining why TG we run our lists, and for that matter most of our infrastructure, TG according to standards offensive. That was certainly not the intent of TG the page, merely to give a thorough response to a request that has come TG up a few times and also provide some pointers. We are a standards body TG and feel strongly about promoting and adhering to standards in addition TG to creating them but we do not try to be condescending in doing so. Let's be clear. This is NOT an standards issue. As far as I can tell both RFCs mentioned tell you WHAT to do. This is good. They do not tell you what you should not do; I see no mention of subject lines in either. TG Also I thought it would be helpful to start a Wiki [2] on configuring TG filtering for various mail clients, as most are capable of filtering on TG List-Id and other headers. And here is the problem. You assume that you know how I or others use subject line tags. Actually, I don't filter on them at all; I use to or cc addresses, as this captures emails cc'd to me personally, which your List-Id technique fails for. I use them because I filter several mailing lists into one folder and like the visual cue this gives. I identify semweb-lifesci mails by the lack of a tag, which works because it's the only list in that folder that doesn't. So, you see, on this basis ALL of the points on your subject-tagging page are, well, either irrelevant or wrong. If you are going to provide a service, then listen to your users. If you are not going to listen to your users, then don't provide a service. There are others who can do it better. Sorry for sounding so irritable on this; it's a bad time of year for me. Normally, I'd take this with more of a sense of humour, or a slightly raised eyebrow. Cheers Phil
Re: [semweb-lifesci]
PH == Pat Hayes [EMAIL PROTECTED] writes: Our Systems Team has fielded this request many times. PH Its about time it bloody well listened, then. Yes. PH By the way, the tone of the document [1] is extremely annoying. If the PH W3C were a company taking this attitude, it would have lost its customer PH base years ago. Of course, the W3C isn't a company: but y'all might give PH some thought to the fact the great bulk of the W3C's work is done by PH volunteers, who are the people getting screwed over by the Systems PH Team's almost palpable arrogance. The document suggests that if W3C sticks with it's silly policy, then perhaps mail client developers will fix their clients. I think that the opposite is also true; if W3C is incapable of producing an mailing list which can be configured to their owners' wishes, rather than W3C's own dogma, we should perhaps move the mailing list elsewhere. I would rather see the effort invested in getting W3C to fix their broken policy than use workarounds which give them no incentive. Incidentally, I don't filter on subject line. Phil
Re: identifier to use
EJ == Eric Jain [EMAIL PROTECTED] writes: EJ Phillip Lord wrote: I don't understand the desire to implement everything using HTTP. EJ Likewise, I don't understand the desire to implement everything using EJ anything but HTTP :-) If there is an existing system that is EJ (incredibly) widely adopted and that can be built upon, surely that's EJ the way to go? Actually, LSIDs are built on top of HTTP. The initial step is web service and http delivered. The second stage is multi-protocol which includes HTTP. Why call lots of things, which are actually several protocols by a name which suggests that they are all one. How to distinguish between an HTTP URI which allows you to do location independent, two step resolution and one which doesn't. Well, one solution would be, perhaps, to call it something different, say, perhaps, LSID? EJ You could have the concept of LS HTTP URIs that follow certain EJ conventions, may be useful for some, but I don't quite see the problem EJ with the fact that you will be able to resolve some HTTP URIs, but not EJ others: The only way to know whether a URI can be resolved or not, in EJ the end, is to try; some systems just seem to make doing so harder... The other way to know whether a URI can be resolved is to use a different name for those which are not mean to be resolved. To me it makes no sense to layer multi different protocols over a single identifier. Imagine I get an URI like http://uniprot.org/P4543, it could be 1) a meaningless concept identifier in an ontology 2) a URL which resolves to a pretty web page, via a single step process 3) a URL which always resolve to the same data 4) A URL which resolves to the current version of some spec like the W3C recommendation pages. 5) A URL which is meant to be considered to be a location independent ID. 6) What ever else we have decided to layer onto the same identifier scheme. To me, it doesn't make any sense. Phil
Re: identifier to use
DS == Booth, David (HP Software - Boston) [EMAIL PROTECTED] writes: From: Phillip Lord [ . . . ] I don't understand the desire to implement everything using HTTP. Why call lots of things, which are actually several protocols by a name which suggests that they are all one. How to distinguish between an HTTP URI which allows you to do location independent, two step resolution and one which doesn't. Well, one solution would be, perhaps, to call it something different, say, perhaps, LSID? DS But that's like asking Why call everything URNs?. No it isn't. http:// based URIs carry the assumption that they are potentially resolvable by a defined protocol. URNs do not. DS LSIDs are layered on top of URNs. Certainly conventions layered on top DS of HTTP URIs can have names too, just as conventions layered on top of DS URNs can. For example, the LSID conventions layered on top of HTTP DS could be named HLSID and published in a specification just as the DS existing LSID conventions are. LSID conventions are layered on top of HTTP already. They just use a different convention for naming, to indicate that they are different. Phil
Re: identifier to use
XW == Xiaoshu Wang [EMAIL PROTECTED] writes: XW Phillip Lord wrote: To me it makes no sense to layer multi different protocols over a single identifier. Imagine I get an URI like http://uniprot.org/P4543, it could be 1) a meaningless concept identifier in an ontology 2) a URL which resolves to a pretty web page, via a single step process 3) a URL which always resolve to the same data 4) A URL which resolves to the current version of some spec like the W3C recommendation pages. 5) A URL which is meant to be considered to be a location independent ID. 6) What ever else we have decided to layer onto the same identifier scheme. To me, it doesn't make any sense. XW Does it make sense to you if our personal name is put like Xiaoshu, XW male, dark hair, 5'8, email=..., address, etc., etc., Wang? Because if XW so, I think we would be required to name ourself with our DNA string, XW which is still not enough since it doesn't have my birth time, place, XW alive-status XW Don't mistaken name/identifier as information. Then ask yourself what XW you want from a name? Then, a lot of sense will start coming to you. Unlike you, I have stated what I think the requirements are from an identifier in life sciences. I want an identifier that I can do one thing with and, preferably, one thing only. Not 5. I am not suggesting that we put semantics into the identifiers other than those semantics that we need for using the ID. So, your analogy is wrong. I have a better analogy. My name is Dr Phillip Lord. This is already overloaded, as I'm a yeast geneticist (or was) not a medic. Why don't we give another 4 or 5 meanings to Dr on the grounds that, as people have seen Dr before, they will be happier with that than a new title. Phil
Re: identifier to use
MS == Matthias Samwald [EMAIL PROTECTED] writes: MS So you want to advertise what can be expected (or NOT expected) before MS the web client starts the retrieval process? If we are to use http: based URIs for things which are never meant to be retrieved (like ontology concepts), then it has to be before the retrieval process, no? MS If this is desirable, why could we not, in theory, agree on different MS syntactic hints in normal HTTP URIs? MS For example: http://uniprot.org/P4543_concept MS http://uniprot.org/P4543_web_resource MS http://uniprot.org/P4543_immutable_data MS http://uniprot.org/P4543_location_independent_id_(whatever_that_means) Yes, we could invent some naming conventions and layer them over the top of http, at least where http is capable of supporting the requirements; for the last one, it's isn't. MS This way, we could also give Semantic Web clients a message like you MS probably don't really need to resolve this and you can probably not MS expect something when you try, but if you really want to, you MS can. Trying to resolve a URI does not have zero costs for a client MS application, so they would probably try to follow this recommendation to MS avoid unnecessary HTTP GET requests (which, in turn, helps to avoid MS unnecessary net/server load). MS I do not really think that something like this would find widespread MS adoption, but it is certainly still more realistic than inventing and MS agreeing on a wholly new protocol for each. Well, again, little of LSID is wholly new. In fact, the LSID protocol uses other, standard protocols. But inventing a new protocol is precisely what happened when HTTP was created. Why did we not just reuse ftp? I think we are going around in circles here, so I'll leave the conversation there. Phil
Re: identifier to use
EJ == Eric Jain [EMAIL PROTECTED] writes: These archives will all need to use opaque identifiers to track relationships, provenance, versions, and other metadata. EJ The only digital archive project I'm vaguely familiar with is the EJ Internet Archive project, and that seems to relies on URLs. If someone EJ has some insight into any of the other projects (especially how they can EJ or can't handle different identifier schemes), that would be *really* EJ interesting! Eric, trying putting digital preservation into google. There many projects out there working in this area. Phil
Re: identifier to use
EJ == Eric Jain [EMAIL PROTECTED] writes: EJ Do you mean fail over at run time, so when an identifier can't be EJ resolved, the resolver retries with a backup service? Hilmar described the mechanism in his last email. Again, perhaps I am wrong. EJ In general, my feeling is that there are lots of special mechanisms that EJ may be useful for some application (but overkill for others), but I EJ don't see any strong arguments why these couldn't be implemented with EJ HTTP URIs (which have the benefit that they can also be made usable in EJ simple ways). I don't understand the desire to implement everything using HTTP. Why call lots of things, which are actually several protocols by a name which suggests that they are all one. How to distinguish between an HTTP URI which allows you to do location independent, two step resolution and one which doesn't. Well, one solution would be, perhaps, to call it something different, say, perhaps, LSID? As far as I can see, LSIDs are basically location independent. The only whole I can see is if someone else buys uniprot.org, sets up an LSID resolution service and then returns crap. purls have the same issue I think. EJ Yes, I guess that's a problem with all solutions that make use of the EJ domain name system in some way. (But I still think the benefits of doing EJ so outweigh the problems that are introduced by not using it.) I don't think LSID can cope with this, although a small extension would allow you to; you just need to blacklist domains where you should automatically use the fail over mechanism. EJ Note that any other name-based registration system could run into EJ trouble, too: Let's say UniProt lost a trademark suite and was forced to EJ change its name to something else, I assume that wouldn't be good for EJ location independent identifiers such as urn:bm:uniprot:P12345... If you loose a trademark and have to stop using identifiers with a uniprot in them, then any system which uses a alphabetical ID is stuffed. Numbers would be okay, cause you can't trademark numbers. The law is an ass. Phil
Re: identifier to use
EJ == Eric Jain [EMAIL PROTECTED] writes: EJ I guess that could happen... Do you have some examples of EJ domain-specific standards that became de-facto standards, supported by EJ generic tools etc? The web leaps to mind. Remember that? As for being limited to a domain or not, would the LSID mechanism be more appealing if it read urn:guid:foo.org:Foo:12345? There's nothing in the LSID spec that makes it LS-specific, or due to which it make no sense outside of the LS. EJ You're right, from a technical point of view, it's not EJ domain-specific. But if no one else is using it, doesn't that make it EJ de-facto domain-specific? Actually, LSIDs are domain specific, or rather they were designed to support the needs of the Life Sciences; this is not to say that different domains do not have the same needs. Look at DOIs and LSIDs. They are different, they emphasise different things. LSIDs are based around a set of objects which potentially might be very large and which might exist in many versions. So LSIDs have two-step multi-protocol resolution. They have version numbers integrated. They exist in a world where services disappear. So LSIDs have a fail over mechanism. DOIs are based around the stable, large organisations giving out the DOIs, hence they have a heavy weight assignment (possibly involving cash). They are based around small resources, mostly of the size that people can read, without necessity for many, many versions. Conclusion: LSID and DOI are NOT domain specific at all, they are requirement specific. They exist because different domains have different requirements. No generic ID is going to fulfil the requirements is going to fulfil the requirements is my thought. Do you mean you would prefer if each journal set up URIs based on its self-chosen domain-name and we reference articles through that instead of DOIs? Or did you want to say something else? EJ If instead of doi:10.1038/nrg2158 an official URI looked something like EJ http://dx.doi.org/10.1038/nrg2158, would this make the system less EJ popular? EJ In fact, I suspect that the lack of such a transformation mechanism EJ turned away many people from the LSID system (that, and the ugly syntax EJ :-) DOIs worked because there are actually relatively few publishers and they moved on mass. In biology there are many more service providers, and most will not adopt something till it looks stable and until people really bitch about wanting it. Phil
Re: Ambiguous names. was: Re: URL +1, LSID -1
Alan == Alan Ruttenberg [EMAIL PROTECTED] writes: Alan Well, if I am restricted to using such Uniprot classes I will have Alan trouble representing important scientific findings. If Uniprot only Alan has one name for the two molecules, one of which has a snp that leads Alan to a loss of function that is the initiating factor of a disease, then Alan we have a problem, no? How do we say things about the disease related Alan form? Make statements against an isoform of P38228. If you create identifiers to describe proteins rather than protein records (like uniprot) then you have created a whole new set of IDs. When anyone wants to talk about a protein, they will have to look up the ID. Alan As they will when they want to talk about a record. Of course perhaps Alan we all will add some links of the sort that say the record is about Alan some set of classes of proteins, and that aspects of the protein in a Alan class can be described by pieces of the record. Alan But at least we'll know what we are talking about. The question here is whether you will add to the confusion or decrease it. You need to put an entire infrastructure in place for providing sane, consistent, clearly defined names to proteins. I just use swissprot. Alan But I'm open to discussing suggestions for representing these Alan statements by only making use of the Uniprot records ids, if you have Alan any. Well, swissprot refers to isoforms I think. Push comes to shove, just use the sequence. Phil -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Claremont Tower Room 909, skype: russet_apples Newcastle University, NE1 7RU
Re: IDs + 5; everybody - 10
The LSID protocol defines a protocol for resolving LSIDs some of which uses web services. If it were a bring problem, the WS definition of this API could be replaced with something else, such as REST. This is true with any API specification. In this case, the LSID API is not that complex, so the translation wouldn't be too hard. Alan == Alan Ruttenberg [EMAIL PROTECTED] writes: Alan I'm intrigued by this remark. Phil, would it be possible to sketch out Alan how one could graft REST style services into LSID space? Alan -Alan Alan On Jul 16, 2007, at 8:22 AM, Phillip Lord wrote: The LSID use of web services should not really be seen as a problem. Push comes to shove, even this part could be replaced or made option if a REST style solution where desired. -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Claremont Tower Room 909, skype: russet_apples Newcastle University, NE1 7RU
Re: Ambiguous names. was: Re: URL +1, LSID -1
MS == Matthias Samwald [EMAIL PROTECTED] writes: It would be more satisfying for us to know intentionally what we mean by protein. It would be good to have a clear set of definitions. But, ultimately, I think it would be mistaken. If we have the ability to express the class of protein molecules defined by the swissprot record OPSD_HUMAN, then I think we have all we need. MS OWL is very open towards incomplete information. If all we know about MS the protein is the sequence of amino acids, than this is what we add to MS the protein class through a 'some-values-from, necessary' property MS restriction (and not 'necessary and sufficient', since we are still MS unsure if this information alone is enough to DEFINE the protein MS class). If we know that proteins of this class can have some MS polymorphisms, we can enumerate the different possible sequences as best MS as we can. If we are unable to enumerate all of them at the moment, or MS are unsure about something, we just leave it out and maybe add it later. This is my worry. Effectively, I think you are saying why not take all the knowledge in swissprot and duplicate it in our class definitions. I don't see what this adds. All I see is that it will add confusion and the potential for data to get out of date. This is an important issue and will raise it's head repeatedly. Should we define Homo sapiens? Should we determine all the necessary and sufficient conditions? Or should we just point to a pre-existing taxonomy and a pre-existing process? I think that there are many clear reasons for keeping statements about the informatics entities -- the database entries for example. To do otherwise, runs the risk of enormous mission creep (always a problem with data modelling and ontologies). Phil
Re: Rules
CM == Chris Mungall [EMAIL PROTECTED] writes: CM Definitely - although I don't think OWL/SWRL is quite the right tool for CM this job yet CM although perhaps getting closer: CM Putting OWL in Order: Patterns for Sequences in OWL Nick Drummond1, Alan CM Rector1, Robert Stevens1, Georgina M2Matthew Horridge1, Hai H. Wang1, CM Julian Seidenberg1 CM http://owl-workshop.man.ac.uk/acceptedLong/submission_12.pdf CM (I'm not a big fan of the modeling biological sequences as lists CM approach, but I think the results could be replicated using a realist CM representation) There is a much more relevant paper by some of the same authors. author = {Wolstencroft, K. and Lord, P. and Tabernero, L. and Brass, A. and Stevens, R.}, title ={{Protein classification using ontology classification}}, journal = {Bioinformatics}, volume = 22, number = 14, pages ={e530-538}, doi = {10.1093/bioinformatics/btl208}, year = 2006, We can define proteins and classes of proteins in OWL and in some circumstances this can be useful. I think, though, that it's a danger for us to say that because we can define things, we should. At times, it's better to allow things to be self-standing kinds, and leave the details of the definition to authors. Uniprots rules for distinguishing between one protein and another are complex, and I can't see the point of codifying them in OWL (even if you could). Of course, there may be times when you need to make separations that uniprot doesn't -- two isoforms may have the same uniprot ID -- but you could still make this distinction with reference to uniprot. Phil
Re: Ambiguous names. was: Re: URL +1, LSID -1
Alan == Alan Ruttenberg [EMAIL PROTECTED] writes: Alan Summary: Answering Phil's questions, and clarifying one thing he Alan asserts about what I said. What if they have a polymorphism? Alan No. Are two isoforms from an alternate splice the same protein? Alan No. In both of these you differ from uniprot. Unsatisfying, maybe. Clear definitions are important. But interoperability, and the lack of duplication are more so. Alan Forgive my confusion, but how exactly will we achieve interoperability Alan and lack of duplication if we don't have definitions? How would we Alan know that we don't have duplication, for example? If you create identifiers to describe proteins rather than protein records (like uniprot) then you have created a whole new set of IDs. When anyone wants to talk about a protein, they will have to look up the ID. snip And, yet, you just told me that you could buy a antibody with just a swissprot ID. So, let me restate the question, what are you going to do with a protein ID that you are not going to do with a swissprot ID, or the protein formally known as OPSD_HUMAN. Alan I did not say that. I've said some people have identified antibodies Alan by such ids. Unfortunately this information is of limited use when Alan actually ordering an antibody, where I am interested in much more Alan information, such as how specific it is, how it has been validated, Alan and other properties related to how it behaves in certain experimental Alan settings. I *want* to be able to have identifiers(URIs) that are up to Alan the job of ordering reagents. Well, I am not sure that you are going to achieve this with an identifier. You need significant extra amounts of metadata. My point here is simple. Separating out the informatics and biology conform better to our notion of reality, sure. But you are talking about modelling what makes a protein and, more, a type of protein. Work through your scenarios and see whether you need a protein ID for this. If not, you are introducing a layer of abstraction that you don't need. Phil -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Claremont Tower Room 909, skype: russet_apples Newcastle University, NE1 7RU
Re: IDs + 5; everybody - 10
JR == Jonathan Rees [EMAIL PROTECTED] writes: JR Well, to do a fair comparison of LSID URIs and HTTP URIs, you would have JR to take all the features you need, see how to best implement them in JR both contexts, and then make an overall assessment. JR What is your worry, by the way? Would bringing the benefits of LSIDs to JR other parts of the URI space be a bad thing? Well, 3 or 4 years ago I sat in a meeting running over exactly this ground. That time the comparison was to handles. And, now, several years later, we are comparing to pURLs. JR There is the criticism of HTTP URIs that they cannot be used as JR identifiers, and I admit that the pun with URLs can be misleading. It is misleading period. JR To repeat, I'm just trying to be objective. I am not in a position to JR make decisions; I am just trying to elucidate the comparison between the JR two naming schemes so that HCLS can make a rational decision. I was the JR one at the Amsterdam HCLS meeting, at which there were no LSID JR defenders, saying that we ought to listen to what LSID users have to JR say, and in many private conversations I have been coming to the defense JR of benefits that LSIDs have that HTTP URIs so far lack. And I've finally JR gotten around to reading the darned spec. So I hope you LSIDers don't JR think you're being dissed. My suspicion is that you won't find any LSID defenders for the reason that the people who designed the spec do not want to listen to essentially the same arguments again. My problem with this whole process is that you are missing the most important criterion for comparing identifier schemes. They all basically work, as far as I can see, they all basically do the same job. There are technical differences between them but, frankly, they are not that great. So what I want to know is, what is the difference between DOIs and blog permalinks? Both of these have been taken up, in a way that LSIDs or anything in life sciences have not. Perhaps it is because the library and publishing community have had something like this (the ISBN and associated identifiers) for a long time already. My question, then, is not what do the identifiers, but what will people use. Phil
IDs + 5; everybody - 10
Mark == Mark Wilkinson [EMAIL PROTECTED] writes: Mark WSDL is a widely accepted W3C spec that is becoming increasingly accepted Mark worldwide (and is, generally, automatically generated based on your Mark interface, so requires little or no manual construction), and which solves a Mark problem that we *know without any doubt* URLs cannot solve. I really don't Mark see an advantage in trying to ignore them, circumvent them, or otherwise Mark relegate them to a secondary lookup, in the base spec for the Semantic Web, Mark when we know that we are going to have to deal with them at some point I think that I agree. The LSID use of web services should not really be seen as a problem. Push comes to shove, even this part could be replaced or made option if a REST style solution where desired. From my perspective, the thing that worries me about this whole discussion is that we seem to be retreading old paths. The LSID standard first raised it's head many 5 or so years ago. And we still appear to be talking about basic technology here; ultimately, there are still some really big biological issues to be dealt with wrt to identifiers. The biggest barrier, however, is that of community uptake. Ultimately the differences between LSIDs (a two step resolution to provide persistence, location independence and some other stuff) and PURLs (a two step resolution, etc...) are not that important. Technology churn will prevent community uptake faster than almost anything however. I have a solution: I am going to use the wonders of the semantic web to describe all of these different identifiers that we have invented so far and all those we will invent in the future. Now, what identifiers should I use in the ontology? Phil
Re: Ambiguous names. was: Re: URL +1, LSID -1
MK == Marijke Keet [EMAIL PROTECTED] writes: MK Lack of sufficient knowledge about a particular (biological) entity is MK a sideshow, not an argument, to the issue of distinguishing real proteins from MK their records. I agree. The argument is that it's very hard to describe what you mean by a protein. We almost certainly don't mean a protein molecule. We might mean a type of protein. But then we don't know whether two protein molecule are actually of a given type. My questions are how often do we want to refer to a protein, rather than a record about a protein? And who is responsible for ascribing a ID to a specific type of protein. In practice, in bioinformatics, the answer to this is a) we don't and b) uniprot. So, while distinguishing between a uniprot record and a protein seems like a good idea, I'm not convinced it brings you anything. What are you going to do with your protein ID? Phil -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Claremont Tower Room 909, skype: russet_apples Newcastle University, NE1 7RU
Re: IDs + 5; everybody - 10
JR == Jonathan Rees [EMAIL PROTECTED] writes: JR It may look like unnecessary replication, but it's not really, since JR we're already committed to the http: space and all the issues that LSID JR addressed are issues there as well. JR The same remarks apply to handles, DOIs in particular. Are you suggesting that DOIs shouldn't be used either? Phil
Re: Ambiguous names. was: Re: URL +1, LSID -1
Alan == Alan Ruttenberg [EMAIL PROTECTED] writes: I agree. The argument is that it's very hard to describe what you mean by a protein. We almost certainly don't mean a protein molecule. We might mean a type of protein. But then we don't know whether two protein molecule are actually of a given type. Alan I'm confused. I think we all would agree that there are instances of Alan proteins and we have a good idea of what they are. We also know that Alan there are groups of proteins that are built off the same template and Alan share certain properties. Take these rhethorical questions: Is Red Opsin in human the same as Red Opsin in Cattle? Is Red Opsin in me, necessarily the same as Red Opsin in you? What if they have a polymorphism? Are two isoforms from an alternate splice the same protein? If a protein has been partly digested, is it still the same? Are haemoglobin alpha and beta the same? My questions are how often do we want to refer to a protein, rather than a record about a protein? Alan Any time we want to make a scientific statement about proteins. In my Alan work, that means virtually all the time. For example, I have a body of Alan work that is the target of text mining at the moment - If the text Alan mining worked well enough to understand the articles, what should it Alan generate for semantic web consumption? The point is that you can't deal with a protein computationally. You can't resolve it, analyze it computationally. It's always second hand information that you want to deal with. And who is responsible for ascribing a ID to a specific type of protein. In practice, in bioinformatics, the answer to this is a) we don't and b) uniprot. Alan I agree with a) - we mostly don't and when we do we do it in an Alan unclear and nonstandard way. I disagree with b) Exactly what the class Alan of proteins described by a uniprot record is not clear (though Eric Alan started to make a theory of what it could be). I have seen uniprot ids Alan used even to identify antibodies to a protein. Yes, exactly. A uniprot record defines a class of proteins extensionally. This means, antibodies to the proteins described by OPSD_HUMAN (for example). Alan As for who is responsible, I would say that our community is Alan responsible. I expect that there will be efforts along this line in Alan the OBO Foundry and I would hope that there would be broad Alan participation from the people who are interested in following this Alan list. And I would say not. Uniprot are the people who understand proteins, they are the people who already have defined procedures for determining whether one protein is the same as another, who have answered the questions above and who will go back through the resource and update it as biological knowledge changes. And it's a big job. There are 100 annotators working at this. More over, Uniprot are the people who are trusted to make the right decisions, not us. It would be more satisfying for us to know intentionally what we mean by protein. It would be good to have a clear set of definitions. But, ultimately, I think it would be mistaken. If we have the ability to express the class of protein molecules defined by the swissprot record OPSD_HUMAN, then I think we have all we need. If we make our own definitions, all that we have done is duplicate what the uniprot team are already doing. And we will, almost inevitably, do it somewhat differently. All we would do is create confusion. The only way that we ensure that we do the same thing as uniprot is say yeah, what they said. Unsatisfying, maybe. Clear definitions are important. But interoperability, and the lack of duplication are more so. So, while distinguishing between a uniprot record and a protein seems like a good idea, I'm not convinced it brings you anything. What are you going to do with your protein ID? Alan I would like to be able to have Invitrogen be able to say that product Alan xxxyyy is an antibody to some specific class of phosphoproteins in a Alan way that a semantic web agent could do some shopping for me if I Alan needed such a reagent. And, yet, you just told me that you could buy a antibody with just a swissprot ID. So, let me restate the question, what are you going to do with a protein ID that you are not going to do with a swissprot ID, or the protein formally known as OPSD_HUMAN. Phil
Re: IDs + 5; everybody - 10
My apologies. I wasn't sure, which is why I asked. I just found your idea of reproducing LSIDs advantages (and implicitly DOI) in http a little worrying. I may have misread your email. Phi JR == Jonathan Rees [EMAIL PROTECTED] writes: JR I never said LSID or DOIs shouldn't be used, and I don't see how my JR message can be construed as saying this. I'm trying to be fair to all JR solutions by talking about real technical requirements. If the W3C HCLS JR SIG wants to recommend the use - even minting - of LSIDs, that's fine JR with me. But I don't think any decisions have been reached. JR LSID users are committed to using HTTP URIs. For example, anyone who JR uses both LSID and RDF is committed to using the HTTP URI JR http://www.w3.org/1999/02/22-rdf-syntax-ns#type. JR Jonathan JR On 7/16/07, Phillip Lord [EMAIL PROTECTED] wrote: JR == Jonathan Rees [EMAIL PROTECTED] writes: JR It may look like unnecessary replication, but it's not really, since JR we're already committed to the http: space and all the issues that LSID JR addressed are issues there as well. JR The same remarks apply to handles, DOIs in particular. Are you suggesting that DOIs shouldn't be used either? Phil -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Claremont Tower Room 909, skype: russet_apples Newcastle University, NE1 7RU
Re: Evidence
MM == Mark Montgomery [EMAIL PROTECTED] writes: MM Also, for those who use generic email addresses without links to MM web sites, it would be very useful to occasionally inform folks MM on our backgrounds and relationships, like a link to a web page MM and/or bio for example. You might want to try... http://www.google.co.uk/search?q=Alan+Ruttenberg which will reveal much about the mysterious entity known as Alan Ruttenberg. Phil
Re: Advancing translational research with the Semantic Web
PH == Pat Hayes [EMAIL PROTECTED] writes: CM In all the examples given, the lifted[*] n-ary relation was CM never truly a relation in the first place and always better CM modeled as a class. It's kind of cheating. Well, it is kind of cheating, yes, although if it works... PH No, really, its not cheating. This reduction of n-ary relations PH to binary+unary relations is quite general and quite sound, and PH has been known and thoroughly understood for over a century. It PH can always be done, and it often makes perfectly good intuitive PH sense. No, Chris is right. It's cheating. I have to decide before starting which of my relations are n-ary and which are not. Moving between the two is not necessarily a trivial thing to do. And having some relations being relations and some being classes is less than clear. Well, this I would agree with. Folding design patterns in, would be nice. PH Agreed. We made this a central feature of our COE graphic OWL PH editor, in that a user can design a 'template' (a chunk of OWL PH with gaps in it) and give it a name, then just drag-and-drop one PH into a new OWL concept map and fill in the missing PH parameters. Its a simple device and not perfect, but it does PH seem to be useful. Yes. Protege does a similar thing. I'd like to see this at a language level. Phil
Re: Advancing translational research with the Semantic Web
CM == Chris Mungall [EMAIL PROTECTED] writes: Out of curiosity, can you describe how different or similar this is to the result that you can achieve in the N-ary relation design pattern for OWL? Obviously, building things into the DL is nice, but it's not currently representable in OWL, so would require tooling support, while the OWL N-ary relation pattern doesn't. CM I'm afraid I'm unclear how to state the OWL n-ary relation CM pattern (http://www.w3.org/TR/swbp-n-aryRelations) where I CM really need it. In all the examples given, the lifted[*] n-ary CM relation was never truly a relation in the first place and CM always better modeled as a class. It's kind of cheating. Well, it is kind of cheating, yes, although if it works... CM What if my n-ary relation is transitive or if the 3rd argument CM is a temporal interval over which the relation holds? The former is hard because it's not clear what do you with n-ary relationships. I think that this is true for any representation. Fundamentally, if you say a is part of b and I say b is part of c, then is a part of c and according to whom? It is possible to use build on top of the n-ary relationship, for example a symmetric property. Perhaps you could do the same for transitivity if you could work out exactly what the semantic should be. CM I think the former is doable with property role chains. Updating CM the n-ary relations note with this - and all the other omitted CM details, such as how to re-represent domain/range, functional CM properties, n- ary relations in restrictions etc - would take a CM lot of work and would make it utterly terrifying to the naive CM user. Yep, but I think that this reflects the underlying complexities of life. CM Nevertheless the results are clunky and will need special tool CM support [**] to avoid going insane. In general I am wary of CM design pattern type things - they are usually a sign that the CM language lacks the constructs required to express things CM unambiguously and concisely. It sounds like DLR could provide CM this, which would be great. Well, this I would agree with. Folding design patterns in, would be nice. Phil
Re: Advancing translational research with the Semantic Web
MK == Marijke Keet [EMAIL PROTECTED] writes: MK Regarding “reification design patterns” and the reification MK OWL (not the thorny logic-based representation of beliefs et MK al), permit me to mention that support for n-ary relations MK ---where n may also be 2--- in description logics is already MK possible with DLR [1] and implemented with reasoner-support in MK the iCOM tool (the tool may not live up to end-user-level MK expectations on userfriendliness, but it works) [2]. Out of curiosity, can you describe how different or similar this is to the result that you can achieve in the N-ary relation design pattern for OWL? Obviously, building things into the DL is nice, but it's not currently representable in OWL, so would require tooling support, while the OWL N-ary relation pattern doesn't. Phil
Re: Advancing translational research with the Semantic Web
BP == Bijan Parsia [EMAIL PROTECTED] writes: EJ Reification? That's who, not why. BP No, you can do both with reification. Well, you can do anything with anything:-) The Gene Ontologies evidence codes are and references are much closer. Also, I am not sure of the semantics of reification. BP RDF reification has very little to no built in semantics. What BP it provides is a standardized syntax. Ok. I presume it provided a standardised syntax for something, at least implied. Does it mean, then, when a triple is reified that the triple is in some way associated with this other resource? BP However, all this *supports* your point. There *IS* no BP standardized way to represent this sort of information. There BP is a more or less standard (and widely loathed) hook/technique BP upon which you could build a standard mechanism for representing BP this sort of information. Yeah, thats my feeling. Reification is a start for doing this, and might provide a underpinning. Phil
Re: ISMB Bio-Ontologies Meeting
It's something that we'd like to see:-) Phil Alan == Alan Ruttenberg [EMAIL PROTECTED] writes: Alan I forget, was someone submitting an abstract about our work to Alan this workshop? -Alan Alan On Apr 26, 2007, at 1:18 PM, Susanna wrote: ** Apologies for cross posting **CALL FOR PAPERS and POSTER ABSTRACTS (Deadline May 1st) Proceedings in BMC Bioinformatics *^**^***^^^^^^^^^^*^**^***^** Bio-Ontologies SIG Workshop Vienna, Austria: July 20 2007 “Bio-Ontologies: ten years past and looking to the future”
Re: [biont] Nice wikipedia page on ontology
Alan == Alan Ruttenberg [EMAIL PROTECTED] writes: Phil Yeah, Robert has my main beef which is the distinction between the representation language and the representation itself. Alan Yup. Though there is too often a confusion between the Alan ontology and the representation. In some ways I think that it Alan is unfortunate that OWL has Ontology in it's name. OWL is just following on from common practise. The use of ontology by computer science has added to it's original philosophical meaning, which has also resulted in confusion. I tend to use ontologies as computational artifacts, which is by bias. I don't think that there is much that can be done about this now. The (many) uses of the term have got a bit fixed. Phil The use of algorithms is clearly wrong and I don't think that an upper ontology provides consistency checks, nor that an ontology needs one to be formal. Alan Algorithm's not a good word. Algorithm has a reasonably tightly defined meaning -- actually wikipedia covers it well. I think that the article just uses it wrong. Still, it's been replaced with axioms now. The article struggles towards generating a common understanding. Phil
Re: [biont] Nice wikipedia page on ontology
Hmmm. Sure I wrote more than that in my original email. Yeah, Robert has my main beef which is the distinction between the representation language and the representation itself. The use of algorithms is clearly wrong and I don't think that an upper ontology provides consistency checks, nor that an ontology needs one to be formal. Still, it's an early wikipedia entry. These things often improve over time. Phil Robert == Robert Stevens [EMAIL PROTECTED] writes: Robert 'd be inclined to agree with Phil. I don't where the bit Robert about algorithms has come from. The other mistake, I Robert think, is not to make the distinction between formality of Robert language for representaiton and the formality of the Robert ontology itself. The latter is, I think, a matter of the Robert distinctions made. One can make an ontology in a formal Robert language like owl, but still be informal in the ontological Robert distinctions made. Robert Formal ontological distinctions can be encapsulated in an Robert upper level, but upper level otnoogies are not necessarily Robert formal Robert Anyway, it is bad at almost any level Robert Robert. Robert ,At 13:55 24/01/2007, Phillip Lord wrote: Alan == Alan Ruttenberg [EMAIL PROTECTED] writes: Alan Start at http://en.wikipedia.org/wiki/Formal_Ontology Alan -Alan Well, it starts of with this A Formal ontology is an ontology modeled by algorithms. Formal ontologies are founded upon a specific Formal Upper Level Ontology, which provides consistency checks for the entire ontology and, if applied properly, allows the modeler to avoid possibly erroneous ontological assumptions encountered in modeling large-scale ontologies. Almost none of which I would agree with. -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Claremont Tower Room 909, skype: russet_apples Newcastle University, NE1 7RU
Re: OWL vs RDF
WB == William Bug [EMAIL PROTECTED] writes: WB This is a very important point. Thanks, Phil. WB As is spelled out in the wonderful ProtegeOWL Tutorial PDF WB (which would be wonderful to have updated a bit), leaning on the WB reasoner during early phases of ontology construction is very WB helpful, but ultimately once you have more hardened WB components, you can save the inferred graph and distribute WB that for the user community. Yes, and this is a good route in some cases. It's worth remembering, however, that if you do this then it limits the usages you can make of the ontology -- you can't, for instance, express queries against the ontology using classes which are not mentioned in the ontology already. The ability to combine the conceptual lego of an ontology at any points is very useful at many times. But, it can make deployment architectures simpler. You win some, you loose some. Phil
Re: OWL vs RDF
Alan == Alan Ruttenberg [EMAIL PROTECTED] writes: Alan Well it would be educational to get your view on what you can Alan you do with owl without a reasoner that's not easier to do Alan without owl? You can do lots of things with OWL without a reasoner. The Gene Ontology is representable in OWL, for example, and uses a simple enough expressivity that you could do without a reasoner easily enough. Of course, you need to use some kind of reasoning engine, but something which understands transitive closure is enough. Whether it's easier to do without OWL depends on what the alternatives are. You could also represent GO style semantics in RDF (although, I think, the existential nature of part_of would not be explicit), or indeed anything else capable of representing a graph. Alan And how are you to know when you do need the reasoner and when Alan you don't? When you use enough of the expressivity of OWL, where enough is relatively undefined. Phil
Re: Modeling large scale ontologies in OWL: Unmet needs
DD == David Decraene [EMAIL PROTECTED] writes: DD In large scale ontologies, one link should suffice, DD HasPart, and whether the part is a finger, toe, nail, muscle or DD anything else is not a task for the property to describe, but DD for the target I'm not sure why this should be true for large ontologies. It seems to me that this is just a question of modelling style. Either way should actually work depending on what you are trying to achieve. Having multiple properties allows you to provide different properties to the properties which can be useful. If, in your example, you have a super property hasPart, then it seems to me that it would be relatively straight forward to reduce the information content of the ontology so that the subproperties are no longer represented. So hand some hasDigit Finger can be represented as hand some hasPart Finger. DD In formal ontology you could express this relation DD on a general level of parthood: Hand HasPart 6thfinger, DD cardinality 0. This is not possible in OWL. Many people have used subproperties to do something like this. It's a poor hack for representing qualified cardinality (and doesn't capture exactly the same semantics). But it is only a hack. As others have said, the lack of qualified cardinality in OWL is generally regarded as unfortunate, and it should be coming back in. Phil -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Newcastle University, Claremont Tower, Room 909 NE1 7RU
Re: Performance issues with OWL Reasoners
KV == Kashyap, Vipul [EMAIL PROTECTED] writes: There are two different things in the technologies you mentioned; relational to X mapping tools, and metaschema approaches. They are quite different. For the instance store, the relational database is really an implementation detail. It's basically a reasoner with somewhat limtied expressivity which is persistent and (hopefully) scalable. KV [VK] Is this true only for TBox reasoning, or ABox and TBox KV reasoning? I had the impression (possibly mistaken) that ABox KV reasoning takes advantage of the relational backend. At least KV this is implied by the following snipper From an earlier KV e-mail: The instancestore (or at least the very early version that I implemented) only does T-Box reasoning technically. The A-Box is stored in a relational database. It works like this -- when an instance is asserted (described) the reasoner is used to localise it's description in the ontology. This data is then denormalised and put into the database. So, for example, if you have an entirely asserted hierarchy, you should be able to get all instances of a given class without (very much) reasoning. In other, more complex, cases individuals have to be reasoned over quite a lot at both insert and query phases. The reason that it works is because you can't assert relationships between individuals, so you never need to re-reason things about instances. For example, imagine these assertions phil hasSibling martin This makes me a member of the class of things which have siblings. Next we assert martin hasSex Male Now, my own definition may have changed -- I am a member of the class of things which have brothers. So the new insert potentially requires updating our understanding of all instances. But with the instance store you can't make the first assertion, only things of the form of the second, so you are safe. As I said, this was true of the first instancestore. The current version is cleverer and can make some assertions of the first form. You'd have to ask others for details of this. KV [VK] I guess I need to get out of my laziness and read the KV paper, but how different is the metaschema approach from the X KV to relational mapping approach? It's a metaschema -- every ontology uses the same schema. For a relational mapping, you'd expect different ontologies to use different ones. KV Even if it is a very different approach, is it able to leverage KV the scalability features of an RDBMS enumerated above? I would KV be interested in your responses. The instancestore uses indexes, yes. This is actually easier -- you have only one schema, so the appropriate indexes are the same for every ontology. Phil
Re: Performance issues with OWL Reasoners = subclass vs instance-of
CO == Chimezie Ogbuji [EMAIL PROTECTED] writes: ABox is more complex than TBox, although I believe the difference is not that profound (ie they are both really complex). For a DL as expressive as that which OWL is based on, the complexities are always really bad. In other words, no reasoner can ever guarantee to scale well in all circumstances. CO Once again: pure production/rule-oriented systems *are* built to CO scale well in *all* circumstances (this is the primary advantage CO they have over DL reasoners - i.e., reasoners tuned specifically CO to DL semantics). This distinction is critical: not every CO reasoner is the same and this is the reason why there is CO interest in considerations of using translations to datalog and CO other logic programming systems (per Ian Horrocks suggestion CO below): Well, as I am speaking at the limit of my knowledge I cannot be sure about this, but I strongly suspect that what you say is wrong. Any computational system can only be guaranteed to work well in all circumstances if it is of very low expressivity. If a system implements expressivity equivalent to Turing/Lambda calculus, then no such guarantees are ever possible, nor can you determine algorithmically which code will perform well and which not. Part of the problem with DL reasoners and their scalability is, indeed, their relative immaturity. But, part of the problem is because that is just the way that universe is built. Ain't much that can be done about this. Another interesting approach that has only recently been presented by Motik et al is to translate a DL terminology into a set of disjunctive datalog rules, and to use an efficient datalog engine to deal with large numbers of ground facts. This idea has been implemented in the Kaon2 system, early results with which have been quite encouraging (see http://kaon2.semanticweb.org/). It can deal with expressive languages (such as OWL), but it seems to work best in data-centric applications, i.e., where the terminology is not too large and complex. CO I'd go a step further and suggest that even large terminologies CO aren't a problem for such systems as their primary bottleneck is CO memory (very cheap) and the complexity of the rule set. The set CO of horn-like rules that express DL semantics are *very* small. Memory is not cheap if the requirements scale non-polynomially. Besides, what is the point of suggesting that large terminologies are not a problem? Why not try it, and report the results? Phil
Re: Performance issues with OWL Reasoners
KV == Kashyap, Vipul [EMAIL PROTECTED] writes: I may be wrong here, but as far as I know the expressivity of OWL-DL, for example, is too different from that of RDBMS for this to work completely. KV However, this was done primarily for CLASSIC and other DLs which KV were possibly less expressive than OWL-DL and FACT. Yeah, it's straight forward enough if you just have, for example, subsumption and existentials. KV I was wondering if the current implementations of DL reasoners KV such as Pellet, Racer, etc. adopt this strategy. Not as far as I know. Having said that there is a similar approach, which uses RDBMS. For example, the instance store (http://instancestore.man.ac.uk) KV [VK] Maybe the increased expressivity of OWL-DL leads to the KV above design choice of SQL + reasoning. Yes. Why try to get a RDBMS to do DL reasoning, when a tableaux reasoner can do it for you? Phil
Re: Performance issues with OWL Reasoners
KV == Kashyap, Vipul [EMAIL PROTECTED] writes: Yes. Why try to get a RDBMS to do DL reasoning, when a tableaux reasoner can do it for you? KV [VK] Scalability and Performance :) Not at doing DL reasoning. Relational databases do relational stuff well. For everything else, they are as likely to be rubbish as fast. Phil
Re: Performance issues with OWL Reasoners = subclass vs instance-of
WB == William Bug [EMAIL PROTECTED] writes: WB CLASSes represent UNIVERSALs or TYPEs. The TBox is the set of WB CLASSes and the ASSERTIONs associated with CLASSes. WB INSTANCEs represent EXISTENTIALs or INDIVIDUALs instantiating a WB CLASS in the real world. The ABox is the set of INSTANCEs and WB the ASSERTIONs associated with those INSTANCEs. I'd take a slight step back from this. You can think of classes and instances in this way. But in the OWL sense, a class is a logical construct with a set of computational properties. Instances is a more difficult term. OWL actually has individuals. The instance store uses instances because they are not really OWL individuals. There is also a philosophical concept of what a class is, what a universal is an so on, which may be somewhat different, and is also open to debate. WB Properly specified CLASSes are defined in the context of the WB INSTANCEs whose PROPERTIES and RELATIONs they formally WB represent. WB Properly specified INSTANCEs are defined via their reference to WB an appropriate set of CLASSes. Think this would be circular. An OWL class is defined by the individuals that it might have in any model which fits the ontology. Not just the individuals it has an a specific model. WB Reasoners (RacerPro, Pellet, FACT++) generally have WB optimizations specific to either reasoning on the TBox or WB reasoning on the ABox, but it's difficult (i.e., no existing WB examples experts such as Phil and others can cite) to optimize WB both for reasoning on the TBox, the ABox AND - most importantly WB - TBox + ABox (across these sets). ABox is more complex than TBox, although I believe the difference is not that profound (ie they are both really complex). For a DL as expressive as that which OWL is based on, the complexities are always really bad. In other words, no reasoner can ever guarantee to scale well in all circumstances. This does not mean that you cannot build reasoners which will scale well in practice. Make sense? Phil
Re: Performance issues with OWL Reasoners
KV == Kashyap, Vipul [EMAIL PROTECTED] writes: Not at doing DL reasoning. Relational databases do relational stuff well. For everything else, they are as likely to be rubbish as fast. KV [VK] Agreed! But the hypothesis is that mapping into a proven KV scalable technology such as an RDBMS, even if as a component KV helps build a scalable DL reasoner. The hypothesis is only going to be true IF the mapping is scalable. Otherwise, it doesn't work. KV This is what is exciting about Instance store that it uses RDBMS KV and SQL queries as a subcomponent (generating candidate answers) KV and then does post-processing after that. KV In all the above examples, each of them had to obviously KV implement something extra, but RDBMS+SQL was leveraged as an KV important component. There are two different things in the technologies you mentioned; relational to X mapping tools, and metaschema approaches. They are quite different. For the instance store, the relational database is really an implementation detail. It's basically a reasoner with somewhat limtied expressivity which is persistent and (hopefully) scalable. Because it's using a metaschema approach, you can't do things like use RDBMS security, for example (beyond yes/no). KV I think the SW community should seek to leverage known scalable KV technologies to reach industrial strength scalability and KV performance. Sure, would agree. Phil
Re: Performance issues with OWL Reasoners
KV == Kashyap, Vipul [EMAIL PROTECTED] writes: KV OWL reasoners support two types of reasoning: KV 1. ABox reasoning (reasoning about instance data). Scalability KV here is being achieved here by leveraging relational database KV technology (which is acknowledged to be scalable) and mapping KV OWL instance reasoning operations to appropriate SQL queries on KV the underlying data store. I may be wrong here, but as far as I know the expressivity of OWL-DL, for example, is too different from that of RDBMS for this to work completely. I am not enough of an expert to know if this sort of mapping is possible at all or whether it just cannot be done efficiently. Having said that there is a similar approach, which uses RDBMS. For example, the instance store (http://instancestore.man.ac.uk) which I was briefly involved with (before the backend got to hard for my poor brain!), uses a metaschema backend. Queries are not made by mapping to SQL, but using SQL and reasoner queries together. KV 2. TBox reasoning scalability is a challenge, especially at the KVscale of 100s of KV thousands of classes found in medical ontologies. Would love to KV hear From DL experts on this issue. Again, as far as I understand, the complexity of T-Box and A-Box reasoning for logics such as that underlying OWL-DL are not that different (i.e. they are both terrible!), so the issues are much the same. There is not general answer to the size of a T-box you can reason over. If the T-Box is a simple asserted hierarchy, you can build a pretty large ontology (certainly 10's of thousands) and reason over it -- the reasoning in this case being simple. If you start using lots of more complex expressions then you can limit yourself a lot more. This paper for example, managed to get the Gene Ontology and, I think, all of GOA into a DL form and reason over it in a, er, reasonable amount of time. So scalability to 10's of thousands of T-box and 100's of thousands of A-Box's is possible. http://www.cs.man.ac.uk/~dturi/papers/instancestore2.pdf The DL reasoners are much better than they used to be -- in the good old days, when the world was young, you could get most DL reasoners to eat your CPU on a 10 term ontology. Nowadays, it's fairly hard to do this. Phil -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Newcastle University, Claremont Tower, Room 909 NE1 7RU
Re: A precedent suggesting a compromise for the SWHCLS IG Best Practices
HST == Henry S Thompson [EMAIL PROTECTED] writes: HST With respect to the upcoming W3C Semantic Web Health Care and HST Life Sciences Interest Group f2f discussion of LSIDs, I wonder HST if you might think seriously about adopting an approach similar HST to that used by the ARK (Archival Resource Key) naming scheme HST [1]. HST _Very_ roughly, this would involve Semantic Web uses of LSIDs HST to use an http-scheme version of LSIDs, along the following HST lines: HST URN:LSID:rcsb.org:PDB:1D4X:22 -- HST http://lsids.org/lsid:rcsb.org:PDB:1D4X:22 HST or, alternatively, as per my recent suggestion to Sean HST http://rcsb.org.lsids.org/lsid:PDB:1D4X:22 HST I strongly recommend studying the ARK approach in any case, as HST it seems to me that although starting from a different subject HST area, its requirements are very close to your own. I don't want to get domainist about this, but if it is broadly similar can you give a quick outline as to why ARK is better than LSIDs. I am starting to think that the main difficulty with LSIDs is that it has the phrase Life Sciences in the title which makes it domain dependant. My proposal is that we rename LSID to ARID for Archival Resource ID. Would this solve the difficulties? Phil
Re: [BioRDF] All about the LSID URI/URN
HST == Henry S Thompson [EMAIL PROTECTED] writes: HST Sean Martin writes: HST So, register one of lsids.org, lsids.net, lsids.name or HST lsids.info, and use e.g. http://lsids.or/xxx instead of HST URN:LSID:xxx. Bingo -- no new tools required, works in all HST modern browsers :-). But if the file you are referencing is, say 5Tb, then it doesn't work in a browser at all. With LSID's on the other hand, you may get back a choice of methods to access the data, including one which can cope with 5Tb of data. Incidentally, the approach that you are suggesting demonstrates that LSIDs could be used in concert with URIs. In which case, putting http://; into the LSID adds nothing. This is, in fact, exactly how DOI's work. But the resolution through the doi.org proxy is a convention which can be changed without changing the dois. Sean is entirely correct that encoding the protocol for the transport layer and a DNS based resolution host into the identifiers is a recipe for instability; maybe not a problem for many things, but a disaster for many parts of bioinformatics. I do not still want to be doing synonym resolution when I am 60. What I have totally failed to understand about this discussion is why it has been couched in terms of whether we should use LSIDs or URIs. Of course, LSIDs non standard protocol in a pain, and the two step resolution adds latency. But, this is the cost for circumventing URIs protocol dependency which is also difficult. We should just circumvent this entire discussion about which identifier is perfect for bioinformatics because I can tell you the answer straight out. None of them. LSIDs answer a need. So, people use them. Cheers Phil
Re: LSIDs and ontology segmentation
Mark I know that others (e.g. Damian Gessler and collaborators at Mark NCGR, but I don't have the reference to his submitted Mark manuscript at hand right now... sorry Damian!) are also Mark working on the problem of segmentation by passing a Mark self-inflating flattened ontology fragment. I believe that the work your are referring to has been submitted to the BioOntologies SIG at this years ISMB, the programme for which is available online. http://www.jbb06.org/programme.html This year we are holding a joint conference with BioLink, so this should be a great workshop for anyone interested in semantic web, ontologies or text mining and technologies in the life sciences. Tickets still available! Get them while they are hot! Phil
Re: ontology specs for self-publishing experiment
cm == chris mungall [EMAIL PROTECTED] writes: Converting between one syntax and another is fairly simple, and there are some reasonably tools for it. XSLT would work for converting XML into RDF. I wouldn't like to use it for converting the other way (actually I wouldn't like to use it at all, but this is personal prejudice!). This is assuming, however, that the semantics of the two representations are compatible. To give an example, syntactically it is possible to convert between the GO DAG and an OWL representation of GO. However, the GO part-of relationship doesn't distinguish universal and existential, while OWL forces you to make this distinction; you can't sit on the fence. cm Hi Phillip cm Actually GO uses the definition of part_of from the OBO relation cm ontology cm http://obo.sourceforge.net/relationship/#OBO_REL:part_of cm You can see from the definition that the use of this relation cm suggests an existential relation. The GO OWL transform encodes cm this, so you don't need to sit on the fence, the decision has cm been made for you. Chris My apologies. You are, of course, correct in saying that GO defines an existential relationship, and I'm rather out of date in saying that it doesn't (didn't!). cm There are definitely some issues here in using the cm defined OBO relations (which involve time) to OWL (which has no cm explicit account of time) Yeah, this is true. Time is a difficult one to model anyway, and OWL doesn't help here. Phil
Re: ontology specs for self-publishing experiment
AR == Alan Rector [EMAIL PROTECTED] writes: AR All AR Just catching up. AR Could I strongly support the following. If there is one AR repeatedly confirmed lesson from the medical communities AR experience with large terminologies/ontologies/ it is to AR separate the terms from the entities. There are always AR linguistic artefacts, and language changes more fluidly in both AR time and space than the underlying entities. (In medical AR informatics this is sometimes quaintly phrased as using AR nonsemantic identifiers). Not that I wish to disagree with Alan, of course, but it is worth mentioning the reason that so many identifiers are semantically meaningful in biology; they look better in papers. More over, because they have some meaning associated with them, they are likely to be used correct in papers as biologists will notice when they have the wrong one. My own feeling is that the fly people got it right years ago. Their gene identifiers had meaning, but not too much. So, for example, sevenless is a mutant lacking the 7th cell in the eye. Clear, straight forward and memorable. And if the world changes under you, the name could be left the same because it doesn't really matter that much. Also, some of the names were quite amusing, although the sonic hedgehog gag ran out years ago. Cheers Phil
Re: scientific publishing task force update
SC == Steve Chervitz [EMAIL PROTECTED] writes: They also wrote an interesting paper on the state of bio-ontologies. Nature Biotechnology 23, 1095 - 1098 (2005) doi:10.1038/nbt0905-1095 Are the current ontologies in biology good ontologies? Larisa N Soldatova Ross D King SC Also worth seeing: The MGED ontologies folks wrote a response to SC this article that comments on the bio-ontology development SC process, and addresses some statements Soldatova and King make SC about MO which the MO folks feel are inaccurate or misleading: SC Stoeckert C et al. Nature Biotechnology 24, 21 - 22 (2006) SC doi:10.1038/nbt0106-21b Wrestling with SUMO and bio-ontologies SC http://www.nature.com/nbt/journal/v24/n1/full/nbt0106-21b.html Their paper did cause, how shall I say, somewhat of a stir. SC The reliance on and choice of upper level ontology seems to be a SC big bone of contention. Are there any good reviews on these SC discussing things like why there are so many of them and why SC can't they be combined? Seems like the current trend is to SC accept their existence and work towards making them SC interoperable: If I were being cynical (those of you who know me will know how rare this is), I would suggest that it's a case of standards are so good, that we need one each. The issue is a slightly deeper one in bio-ontologies. It's not clear that an upper ontology actually brings significant value to the table. The claimed advantage of interoperability between ontologies is, to my mind, somewhat bogus; they only really allow interoperability when you are querying over the concepts in the upper ontology. Much more important is that they help to ease the design of an ontology; you have more idea where concepts should go, so you can spend more time worrying about the details of what ever you are modelling and less about the big picture. On the flip side, they tend to complicate some stages of ontology development, mostly notably the first month when you have lots of biologists tearing their hair out trying to work out what a perjurant, continuant, sortal, self-standing kind is. The juries still out in my opinion. Phil
CFP: The Joint BioLINK and 9th Bio-Ontologies Meeting
Due to difficulties some people have had with the submission system, we have extended the deadline till the end of the week (5th May). Full details are at http://www.jbb06.org/ -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: [EMAIL PROTECTED] School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord University of Newcastle, NE1 7RU
Re: Ontology editor + why RDF?
Anita == deWaard, Anita (ELS) [EMAIL PROTECTED] writes: Anita I am reminded of a saying on a Dutch proverb calendar: If Anita love is the answer, could you please repeat the question? If Anita semantics are the answer - what is the problem that is being Anita solved, in a way no other technology lets you? b To be honest, I think that this is a recipe of despair; I don't think that there is any one thing that SW enables you do to that could not do in another way. It's a question of whether you can do things more conveniently, or with more commonality than other wise; after all, XML is just an extensible syntax and, indeed, could do exactly nothing that SGML could not do (when it came out -- XML standards exceed SGML ones now). XML has still been successful. It's more a question of whether, RDF or OWL provides a combination of things that we would not get otherwise. With OWL (DL and lite), I rather like the ability to check my model with a reasoner, and to be able to apply the ontology automatically in some circumstances. With RDF, you have a convenient technology for building a hyperlinked resource, but with added link types. Of course, you could do the latter with straight XML (well, since RDF is XML, you are doing so). And the former could be done without OWL, just with a raw DL; of course, then you wouldn't get some of the additional features of OWL (such as multi-lingual support which derives directly from the XML). Anita Perhaps if we can find a way to nail this down (I also Anita believe the use cases of this working group, and the group as Anita a whole is certainly working towards that aim!) we could try Anita to not just preach the semantic gospel, but Anita actually sell it (forgive the mixed metaphor)... Having said all that went before, I agree with this; having a set of RDF/OWL life sciences success stories which explained why the technology was appropriate (if not uniquely appropriate) would be a good thing, if it has not been done before. Cheers Phil