UK Ontology Network (UKON) 2016 - Last Call for Participation [Deadline: 31st March, 2016]

2016-03-29 Thread Phillip Lord



The Fifth UK Ontology Network meeting (#ukon2016) will take place on Thursday
April 14th, 2016 at Newcastle University, Newcastle upon Tyne. The aims of
this meeting are as follows:

  To enable dissemination of ontology relevant work from across multiple
  disciplines

  To encourage collaboration and cooperation between different members of UK
  organisations working in this area

  To help establish a research agenda in ontology and better communication
  with funding councils and industry needs


The full programme is now available, and we have a fascinating series of
talks, with demo and poster sessions. The meeting will also offer plenty of
opportunities for networking.

http://www.ukontology.org/programme


Registration for UKON 2016 is now available. Please register before 31st of
March, using the link below:

http://www.ukontology.org/registration

Some hotels close to the venue offer reduced rates for UKON delegates. You can
use the link below to take advantage of special rates:

http://www.newcastlegateshead.com/UKON2016

Best wishes, and we look forward to seeing you in Newcastle.

Phillip Lord
James Malone
Goksel Misirli
Jennifer Warrender
Claire Smith


UKON 2016 Organisers



Re: Good, up-to-date tutorial on OWL 2 and Protege for Biomedical domain?

2015-05-06 Thread Phillip Lord

Oliver Ruebenacker cur...@gmail.com writes:
 The challenge of building ontologies is not technical, but
 socio-political.

I think this very much depends on the ontology that you are creating and
what it's purpose is.

When we created the karyotype ontology, there was no socio-political
challenge at all; the knowledge was all there in the first place. The
problem was to create a complex and repetitive ontology consistently,
which behaved correctly. This is a technical challenge, which I think we
solved.

Another problem we are trying to address is linking the axiomatisation
through to the documentation and provenance for that axiomatisation;
again, a largely technical challenge.

There are challenges associated with getting agreement and coming to an
consensus, of course, but these are hardly unique to ontology building.

Phil







Re: Good, up-to-date tutorial on OWL 2 and Protege for Biomedical domain?

2015-05-05 Thread Phillip Lord


Entirely as an excuse to plug my own work, there is my own.

http://homepages.cs.ncl.ac.uk/phillip.lord/take-wing/take_wing.html

It doesn't use Protege, but my own tool, and it's not finished, so it's
not an exact replacement.

Also worth looking at is

http://ontogenesis.knowledgeblog.org

Also, not a tutorial, but with lots of tutorial information it in.

Phil

Matthias Samwald matthias.samw...@meduniwien.ac.at writes:

 Dear all,

 I'm about to teach a course to medical informatics students that have never
 used OWL before. Are there any good, up-to-date tutorials or even course
 materials on OWL 2, biomedical ontology building and Protege that you could
 recommend? I was surprised to find that most publicly available resources have
 gathered quite a bit of dust (focused on OWL 1, old versions of Protege), or
 are not very accessible. I'd be especially interested in materials that avoid
 using the Pizza ontology ;)

 Thanks,
 Matthias

-- 
Phillip Lord,   Phone: +44 (0) 191 208 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   twitter: phillord
NE1 7RU 



[ANN] Tawny-OWL 1.3.0

2014-11-14 Thread Phillip Lord
I am pleased to annouce the 1.3.0 release of Tawny-OWL, now available on
clojars and github (http://github.com/phillord/tawny-owl).

What is Tawny-OWL
=

Tawny-OWL allows construction of OWL ontologies, in a evaluative, functional
and fully programmatic environment. Think of it as the ontology engineering
equivalent of [R](http://www.r-project.org/). It has many advantages over
traditional ontology engineering tools, also described in a
[video introduction](https://vimeo.com/89782389).

- An interactive shell or REPL to explore and create ontologies.
- Source code, with comments, editable using any of a range of IDEs.
- Fully extensible -- new syntaxes, new data sources can be added by users
- Patterns can be created for individual ontologies; related classes can be
  built easily, accurately and maintainably.
- A unit test framework with fully reasoning.
- A clean syntax for versioning with any VCS, integrated with the IDE
- Support for packaging, dependency resolution and publication
- Enabled continuous integration with both ontology and software dependencies


For the Clojure developer
=

Tawny-OWL is predominately designed as a programmatic application for ontology
development, but it can be used as an API. OWL ontologies are a set of
statements about things and their relationships; underneath these statements
map to a subset of first-order logic which makes it possible to answer 
questions about these statements using highly-optimised reasoners.


Take Wing
=

Although in it's early stage, a rich manual is now being written for Tawny-OWL
https://github.com/phillord/take-wing
http://homepages.cs.ncl.ac.uk/phillip.lord/take-wing/take_wing.html

Changes
===

Two new features are added to this release. First, it is now possible to
annotate axioms as well as just entities. Second, new functions have been
added to make development of patterns easier, both in function and macro form.

Full change log is available.

https://github.com/phillord/tawny-owl/blob/master/docs/releases.md




[ANN] Tawny-OWL 1.2

2014-10-10 Thread Phillip Lord


I am pleased to annouce the 1.2.0 release of Tawny-OWL, now available on
clojars and github (http://github.com/phillord/tawny-owl).

What is Tawny-OWL
=

Tawny-OWL allows construction of OWL ontologies, in a evaluative, functional
and fully programmatic environment. Think of it as the ontology engineering
equivalent of [R](http://www.r-project.org/). It has many advantages over
traditional ontology engineering tools, also described in a
[video introduction](https://vimeo.com/89782389).

- An interactive shell or REPL to explore and create ontologies.
- Source code, with comments, editable using any of a range of IDEs.
- Fully extensible -- new syntaxes, new data sources can be added by users
- Patterns can be created for individual ontologies; related classes can be
  built easily, accurately and maintainably.
- A unit test framework with fully reasoning.
- A clean syntax for versioning with any VCS, integrated with the IDE
- Support for packaging, dependency resolution and publication
- Enabled continuous integration with both ontology and software dependencies


For the Clojure developer
=

Tawny-OWL is predominately designed as a programmatic application for ontology
development, but it can be used as an API. OWL ontologies are a set of
statements about things and their relationships; underneath these statements
map to a subset of first-order logic which makes it possible to answer 
questions about these statements using highly-optimised reasoners.


Take Wing
=

Although in it's early stage, a rich manual is now being written for Tawny-OWL
https://github.com/phillord/take-wing
http://homepages.cs.ncl.ac.uk/phillip.lord/take-wing/take_wing.html

Changes
===
The main feature for the 1.2 release has been the incorporation of core.logic,
through (ab)use of the Tawny's querying facilities. A tighter integration
should be possible, having core.logic work directly over the OWL API, but this
was relatively simple to implement. It is performant enough for most uses (the
Gene Ontology renders to Clojure data structures in 1-2 seconds on my

One other substantial change is an aggressive micro-optimisation of
default-ontology and broadcast-ontology functionality. This functionality is
used in many parts of Tawny-OWL, so this results in a significant performance
enhancement.

Full change log is available.

https://github.com/phillord/tawny-owl/blob/master/docs/releases.md



-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   twitter: phillord
NE1 7RU 



Re: Ontology for Somatic Mutations?

2013-07-26 Thread Phillip Lord

In what sense? See if we can generate the description of the karyotype
from a genome sequence? Or at least compare the two?

I agree that this would be interesting. At the moment, the problem that
we have is the ISCN string is computationally relatively intractable. In
most cases, though, the ISCN is all we have: there is no sequence, and
no biological material.

Phil


Karen Eilbeck keilb...@genetics.utah.edu writes:

 Hi Phil
 Nice model of ISCN. We currently allow ISCN strings to be annotated in GVF to
 describe a genome structure. It may be interesting to validate against whole
 genome sequence.
 --K

 On Jul 23, 2013, at 5:24 AM, Phillip Lord wrote:


 To the extent that it helps to answer our use case, co-ordination might
 be useful; our work on the karyotype is reasonably tightly scoped, and I
 wish to maintain this.

 Phil


 Melissa Haendel haen...@ohsu.edumailto:haen...@ohsu.edu writes:
 Hi all, It would be great if we could coordinate these efforts - The
 genotype work we are doing that Chris Baker mentioned earlier on this
 thread (see
 http://www.unbsj.ca/sase/csas/data/ws/icbo2013/papers/ec/icbo2013_submission_60.pdf
 )
 is already being integrated into the sequence ontology.

 Cheers,
 Melissa

 On Jul 22, 2013, at 10:22 AM, Suzanna Lewis 
 s...@berkeleybop.orgmailto:s...@berkeleybop.orgmailto:s...@berkeleybop.org
  wrote:

 Check out the  Sequence Ontology. It is well-established in the genomics 
 community.
 http://sequenceontology.org/

 On Jul 22, 2013, at 4:53 PM, Phillip Lord
 phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.uk
 wrote:


 We are working on a karyotype ontology which describes chromosome 
 abnormalities.

 The first paper is available here which also includes links to the ontology.

 http://arxiv.org/abs/1305.3758




 Oliver Ruebenacker 
 cur...@gmail.commailto:cur...@gmail.commailto:cur...@gmail.com writes:

Hello,

 Does any one know of an ontology for somatic mutations (including SNPs,
 chromosomal abnormalities, etc.)?

Take care
Oliver

 --
 Phillip Lord,   Phone: +44 (0) 191 222 7827
 Lecturer in Bioinformatics, Email:
 phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.uk
 School of Computing Science,
 http://homepages.cs.ncl.ac.uk/phillip.lord
 Room 914 Claremont Tower,   skype: russet_apples
 Newcastle University,   twitter: phillord
 NE1 7RU



 Dr. Melissa Haendel

 Assistant Professor
 Ontology Development Group, OHSU Library
 http://www.ohsu.edu/library/
 Department of Medical Informatics and Epidemiology
 Oregon Health  Science University
 haen...@ohsu.edumailto:haen...@ohsu.edumailto:haen...@ohsu.edu
 skype: melissa.haendel
 503-407-5970




 --
 Phillip Lord,   Phone: +44 (0) 191 222 7827
 Lecturer in Bioinformatics, Email: 
 phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.uk
 School of Computing Science,
 http://homepages.cs.ncl.ac.uk/phillip.lord
 Room 914 Claremont Tower,   skype: russet_apples
 Newcastle University,   twitter: phillord
 NE1 7RU

 Karen Eilbeck
 Associate Professor
 Department of Biomedical Informatics, University of Utah


-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   twitter: phillord
NE1 7RU 



Re: Ontology for Somatic Mutations?

2013-07-23 Thread Phillip Lord

To the extent that it helps to answer our use case, co-ordination might
be useful; our work on the karyotype is reasonably tightly scoped, and I
wish to maintain this.

Phil


Melissa Haendel haen...@ohsu.edu writes:
 Hi all, It would be great if we could coordinate these efforts - The
 genotype work we are doing that Chris Baker mentioned earlier on this
 thread (see
 http://www.unbsj.ca/sase/csas/data/ws/icbo2013/papers/ec/icbo2013_submission_60.pdf
 )
 is already being integrated into the sequence ontology.

 Cheers,
 Melissa

 On Jul 22, 2013, at 10:22 AM, Suzanna Lewis 
 s...@berkeleybop.orgmailto:s...@berkeleybop.org wrote:

 Check out the  Sequence Ontology. It is well-established in the genomics 
 community.
 http://sequenceontology.org/

 On Jul 22, 2013, at 4:53 PM, Phillip Lord 
 phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.uk wrote:


 We are working on a karyotype ontology which describes chromosome 
 abnormalities.

 The first paper is available here which also includes links to the ontology.

 http://arxiv.org/abs/1305.3758




 Oliver Ruebenacker cur...@gmail.commailto:cur...@gmail.com writes:

 Hello,

  Does any one know of an ontology for somatic mutations (including SNPs,
 chromosomal abnormalities, etc.)?

 Take care
 Oliver

 --
 Phillip Lord,   Phone: +44 (0) 191 222 7827
 Lecturer in Bioinformatics, Email: 
 phillip.l...@newcastle.ac.ukmailto:phillip.l...@newcastle.ac.uk
 School of Computing Science,
 http://homepages.cs.ncl.ac.uk/phillip.lord
 Room 914 Claremont Tower,   skype: russet_apples
 Newcastle University,   twitter: phillord
 NE1 7RU



 Dr. Melissa Haendel

 Assistant Professor
 Ontology Development Group, OHSU Library
 http://www.ohsu.edu/library/
 Department of Medical Informatics and Epidemiology
 Oregon Health  Science University
 haen...@ohsu.edumailto:haen...@ohsu.edu
 skype: melissa.haendel
 503-407-5970




-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   twitter: phillord
NE1 7RU 



Re: Ontology for Somatic Mutations?

2013-07-22 Thread Phillip Lord

We are working on a karyotype ontology which describes chromosome abnormalities.

The first paper is available here which also includes links to the ontology.

http://arxiv.org/abs/1305.3758




Oliver Ruebenacker cur...@gmail.com writes:

  Hello,

   Does any one know of an ontology for somatic mutations (including SNPs,
 chromosomal abnormalities, etc.)?

  Take care
  Oliver

-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   twitter: phillord
NE1 7RU 



[ANN] tawny-owl 0.11

2013-05-23 Thread Phillip Lord

I'm pleased to announce the release of tawny-owl 0.11. 

What is it?
==

This package allows users to construct OWL ontologies in a fully programmatic
environment, namely Clojure. This means the user can take advantage of
programmatic language to automate and abstract the ontology over the
development process; also, rather than requiring the creation of ontology
specific development environments, a normal programming IDE can be used;
finally, a human readable text format means that we can integrate with the
standard tooling for versioning and distributed development.

Changes
===

# 0.11

## New features

- facts on individual are now supported
- documentation has been greatly extended
- OWL API 3.4.4


A new paper on the motivation and use cases for tawny-owl is also
available at http://www.russet.org.uk/blog/2366

https://github.com/phillord/tawny-owl

Feedback welcome!


-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   twitter: phillord
NE1 7RU 



Re: owl:sameAs - Harmful to provenance?

2013-04-09 Thread Phillip Lord
Oliver Ruebenacker cur...@gmail.com writes:

 Hello Philip,


Phillip:-)


 Apparently, you are confusing two different cases. I talked about the
 same reference meaning two different things. You are talking about
 different references talking about the same thing.

No. dc:creator means many different things. It's the same. 


 Confusion is the enemy of understanding.


Confusion is the start point for all knowledge. 



Re: owl:sameAs - Harmful to provenance?

2013-04-09 Thread Phillip Lord


Compare all you like. RDF is just another technology; it's not going to
let me do anything that I cannot do in another way. I'm interested in
using it because it is there, not for any other reason. 

The surface syntax problem; yeah, it is and remains a pain, more some in
some areas than others. 

Phil


Alan Ruttenberg alanruttenb...@gmail.com writes:
 Thinking about metadata as some other category of data is usually a bad
 sign. I've often found it to mean, in practice, data I care less about.

 Phil, to make the case that RDF helps here, we would want to compare how
 easy it is to do significant work using the ill-represented examples you
 find versus raw text, versus xml, versus tab-delimited files. While there
 is some limited benefit to getting rid of the surface syntax problem, it's
 not clear how much of a problem that ever was.

 -Alan


 On Mon, Apr 8, 2013 at 1:16 PM, Bhat, Talapady N. 
 talapady.b...@nist.govwrote:

 Hi,
 -

 Introduction -Dublin Core:
 The Dublin Core Metadata Element Set is a vocabulary of fifteen properties
 for use in resource description. The name Dublin is due to its origin at
 a 1995 invitational workshop in Dublin, Ohio; core because its elements
 are broad and generic, usable for describing a wide range of resources.

 The fifteen element Dublin Core described in this standard is part of a
 larger set of metadata vocabularies
 --
 As per the introduction (given above) section of doubling core (
 http://dublincore.org/documents/dces/) its focus is primarily metadata
 whereas the actual author names mentioned below probably need be considered
 as 'data'. I do not think Dublin core has really focused on building
 standard re-usable vocabulary for 'data'. That is the real problem. That is
 why we have been focusing on re-usable terms for 'data'

 http://www.biomedcentral.com/1471-2105/12/487  and
 http://xpdb.nist.gov/chemblast/pdb.pl  and
 http://www.nature.com/nmeth/journal/v9/n7/abs/nmeth.2084.html

 T N Bhat

 -Original Message-
 From: Phillip Lord [mailto:phillip.l...@newcastle.ac.uk]
 Sent: Monday, April 08, 2013 12:53 PM
 To: Oliver Ruebenacker
 Cc: David Booth; Pat Hayes; Peter Ansell; Alan Ruttenberg;
 public-semweb-lifesci
 Subject: Re: owl:sameAs - Harmful to provenance?


 And it is this bit -- before we can do anything useful that is utterly
 wrong.

 Recently I have spent a lot of time look at Dublin Core creator fields.
 You could not believe how many different ways they are used. String
 literals (Phillip Lord), last-first (Lord, Phillip), with abbrevs (P.
 Lord), multi-author (Phillip Lord; Lindsay Marshall), with titles (Dr
 Phillip Lord) and so on.

 So, is everyone using Dublin Core wrong? It is useless till everyone uses
 it the same way? Emphatically no, it is not useless.

 Would it better if everybody did use it the same way? The answer is
 probably not. Names are incredibly complex, and representing them is, in
 turn, difficult and hard. Any specificiation which did full justice to all
 the different name forms in existance would be incredibly long-winded. Many
 people using the specification would get it wrong; or you could have a
 mechanism for ensuring people always used it correctly.
 Then I am sure that both people who ended up using this form of spec would
 have great fun integrating their tiny datasets.

 In the example, we have a number of sets of assertions which individually
 fulfil their creators use-cases. Then, when they are bought together, the
 assertions become inconsistent, telling you up front that there is work to
 be done. And you ask in what way is this useful?

 Perfection is the enemy of Good.



 Oliver Ruebenacker cur...@gmail.com writes:
So what most people here are saying is that before we can do
  anything useful, we need to make sure that if two assertions use the
  same reference, they mean the same thing.
 
To which you respond that you will accept assertions without
  assuming that same references mean same things. You will just keep them
 separate.
  There is no rule against that.
 
But in what way is this useful?
 
   Take care
   Oliver
 
  On Mon, Apr 8, 2013 at 10:07 AM, David Booth da...@dbooth.org wrote:
 
  Hi Pat,
 
 
  On 04/04/2013 02:03 AM, Pat Hayes wrote:
 
 
  On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote:
 
   On 4 April 2013 11:58, David Booth da...@dbooth.org wrote: On
  04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2,
  2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote:
  On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote:
 
  If only owl:sameAs were used correctly...
 
  Well, I agree that is a problem, but don't draw the conclusion that
  there is something wrong with sameAs, just because people keep
  using it wrong.
 
  Agreed.  And furthermore, don't draw the conclusion that someone
  has used owl:sameAs wrong just because you get garbage when you
  merge two graphs that individually worked just

Re: owl:sameAs - Harmful to provenance?

2013-04-09 Thread Phillip Lord

I think that your name is unique. There is no other string of letters
which is the same as yours which is not identical to your name. Agreed,
that there is not a one-to-one mapping between your name and people in
existance. But this is not necessarily a problem; depends on your use
case.

Phil


Michael Miller michael.mil...@systemsbiology.org writes:
 phillip,  not to mention a name (like mine!) is not particularly unique.

 cheers,
 michael

 Michael Miller
 Software Engineer
 Institute for Systems Biology

 -Original Message-
 From: Phillip Lord [mailto:phillip.l...@newcastle.ac.uk]
 Sent: Monday, April 08, 2013 9:53 AM
 To: Oliver Ruebenacker
 Cc: David Booth; Pat Hayes; Peter Ansell; Alan Ruttenberg;
 public-semweb-
 lifesci
 Subject: Re: owl:sameAs - Harmful to provenance?


 And it is this bit -- before we can do anything useful that is utterly
 wrong.

 Recently I have spent a lot of time look at Dublin Core creator fields.
 You could not believe how many different ways they are used. String
 literals (Phillip Lord), last-first (Lord, Phillip), with abbrevs
 (P. Lord), multi-author (Phillip Lord; Lindsay Marshall), with
 titles (Dr Phillip Lord) and so on.

 So, is everyone using Dublin Core wrong? It is useless till everyone
 uses it the same way? Emphatically no, it is not useless.

 Would it better if everybody did use it the same way? The answer is
 probably not. Names are incredibly complex, and representing them is, in
 turn, difficult and hard. Any specificiation which did full justice to
 all the different name forms in existance would be incredibly
 long-winded. Many people using the specification would get it wrong; or
 you could have a mechanism for ensuring people always used it correctly.
 Then I am sure that both people who ended up using this form of spec
 would have great fun integrating their tiny datasets.

 In the example, we have a number of sets of assertions which
 individually fulfil their creators use-cases. Then, when they are bought
 together, the assertions become inconsistent, telling you up front that
 there is work to be done. And you ask in what way is this useful?

 Perfection is the enemy of Good.



 Oliver Ruebenacker cur...@gmail.com writes:
So what most people here are saying is that before we can do
 anything
  useful, we need to make sure that if two assertions use the same
 reference,
  they mean the same thing.
 
To which you respond that you will accept assertions without
 assuming
  that same references mean same things. You will just keep them
 separate.
  There is no rule against that.
 
But in what way is this useful?
 
   Take care
   Oliver
 
  On Mon, Apr 8, 2013 at 10:07 AM, David Booth da...@dbooth.org wrote:
 
  Hi Pat,
 
 
  On 04/04/2013 02:03 AM, Pat Hayes wrote:
 
 
  On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote:
 
   On 4 April 2013 11:58, David Booth da...@dbooth.org wrote: On
  04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2,
  2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote:
  On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote:
 
  If only owl:sameAs were used correctly...
 
  Well, I agree that is a problem, but don't draw the conclusion
  that there is something wrong with sameAs, just because people keep
  using it wrong.
 
  Agreed.  And furthermore, don't draw the conclusion that someone
  has used owl:sameAs wrong just because you get garbage when you
  merge two graphs that individually worked just fine.  Those two
  graphs may have been written assuming different sets of
  interpretations.
 
  In that case I would certainly conclude that they have used it
  wrong. Have you not been reading what Pat and I have been writing?
 
  I've read lots of what you and Pat have written.  And I've learned
  a lot from it -- particularly in learning about ambiguity from Pat.
  And I'm in full agreement that owl:sameAs is *often* misused.
 
  But I don't believe that getting garbage when merging two graphs
  that individually worked fine *necessarily* indicates that
  owl:sameAs was misused -- even when it appears on the surface to be
  causing the problem.
 
 
  I agree, but not with your example and your analysis of it.
 
   Here's a simple example to illustrate.
 
  Using the following prefixes throughout, for brevity:
 
  @prefix :http://example/owen/ . @prefix owl:
  http://www.w3.org/2002/07/**owl#
 http://www.w3.org/2002/07/owl# .
 
  Suppose that Owen is the URI owner of :x, :y and :z, and Owen
  defines them as follows:
 
  # Owen's URI definition for :x, :y and :z :x a :Something . :y a
  :Something . :z a :Something .
 
  That's all.  That's Owen's entire definition of those URIs.
  Obviously this definition is ambiguous in some sense.  But as we
  know, ambiguity is ultimately inescapable anyway, so I have merely
  chosen an example that makes the ambiguity obvious. As the RDF
  Semantics spec puts it: It is usually impossible to assert enough
  in any language

Re: owl:sameAs - Harmful to provenance?

2013-04-09 Thread Phillip Lord
Kingsley Idehen kide...@openlinksw.com writes:

 On 4/9/13 11:31 AM, Phillip Lord wrote:
 Compare all you like. RDF is just another technology; it's not going to
 let me do anything that I cannot do in another way.
 So you are questioning its unique selling points, I assume? 

No. I don't care. I just care whether it's useful. Who cares whether
it's uniquely useful.

 If so, can you point us to a technology that addresses the issue of
 grounding logic in data
 -- in a manner that's totally platform independent?

It's a data representation technology. Lots of things do this. Totally
platform independent. I don't know what platform means these days. 

 We want to be able to leverage logic in the process of actual data
 representation, access, integration, and management. I know of no technology
 that addresses the problem like RDF i.e., in a platform agnostic manner that
 echoes the essence of the Web itself.

RDF is nice. It's useful. It will remain useful, at least if people are
allowed to use it without being told that they are doing it all wrong.

I am not attacking RDF; I am attacking the notion that everything has to
be perfect, to work in every circumstance, for it to be useful at all. 

Phil



Re: owl:sameAs - Harmful to provenance?

2013-04-08 Thread Phillip Lord

And it is this bit -- before we can do anything useful that is utterly
wrong. 

Recently I have spent a lot of time look at Dublin Core creator fields.
You could not believe how many different ways they are used. String
literals (Phillip Lord), last-first (Lord, Phillip), with abbrevs
(P. Lord), multi-author (Phillip Lord; Lindsay Marshall), with
titles (Dr Phillip Lord) and so on. 

So, is everyone using Dublin Core wrong? It is useless till everyone
uses it the same way? Emphatically no, it is not useless.

Would it better if everybody did use it the same way? The answer is
probably not. Names are incredibly complex, and representing them is, in
turn, difficult and hard. Any specificiation which did full justice to
all the different name forms in existance would be incredibly
long-winded. Many people using the specification would get it wrong; or
you could have a mechanism for ensuring people always used it correctly.
Then I am sure that both people who ended up using this form of spec
would have great fun integrating their tiny datasets.

In the example, we have a number of sets of assertions which
individually fulfil their creators use-cases. Then, when they are bought
together, the assertions become inconsistent, telling you up front that
there is work to be done. And you ask in what way is this useful?

Perfection is the enemy of Good.



Oliver Ruebenacker cur...@gmail.com writes:
   So what most people here are saying is that before we can do anything
 useful, we need to make sure that if two assertions use the same reference,
 they mean the same thing.

   To which you respond that you will accept assertions without assuming
 that same references mean same things. You will just keep them separate.
 There is no rule against that.

   But in what way is this useful?

  Take care
  Oliver

 On Mon, Apr 8, 2013 at 10:07 AM, David Booth da...@dbooth.org wrote:

 Hi Pat,


 On 04/04/2013 02:03 AM, Pat Hayes wrote:


 On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote:

  On 4 April 2013 11:58, David Booth da...@dbooth.org wrote: On
 04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2,
 2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote:
 On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote:

 If only owl:sameAs were used correctly...

 Well, I agree that is a problem, but don't draw the conclusion
 that there is something wrong with sameAs, just because people keep
 using it wrong.

 Agreed.  And furthermore, don't draw the conclusion that someone
 has used owl:sameAs wrong just because you get garbage when you
 merge two graphs that individually worked just fine.  Those two
 graphs may have been written assuming different sets of
 interpretations.

 In that case I would certainly conclude that they have used it
 wrong. Have you not been reading what Pat and I have been writing?

 I've read lots of what you and Pat have written.  And I've learned
 a lot from it -- particularly in learning about ambiguity from Pat.
 And I'm in full agreement that owl:sameAs is *often* misused.

 But I don't believe that getting garbage when merging two graphs
 that individually worked fine *necessarily* indicates that
 owl:sameAs was misused -- even when it appears on the surface to be
 causing the problem.


 I agree, but not with your example and your analysis of it.

  Here's a simple example to illustrate.

 Using the following prefixes throughout, for brevity:

 @prefix :http://example/owen/ . @prefix owl:
 http://www.w3.org/2002/07/**owl# http://www.w3.org/2002/07/owl# .

 Suppose that Owen is the URI owner of :x, :y and :z, and Owen
 defines them as follows:

 # Owen's URI definition for :x, :y and :z :x a :Something . :y a
 :Something . :z a :Something .

 That's all.  That's Owen's entire definition of those URIs.
 Obviously this definition is ambiguous in some sense.  But as we
 know, ambiguity is ultimately inescapable anyway, so I have merely
 chosen an example that makes the ambiguity obvious. As the RDF
 Semantics spec puts it: It is usually impossible to assert enough
 in any language to completely constrain the interpretations to a
 single possible world.


 Yes, but by making the ambiguity this obvious, you have rendered
 the example pointless. There is *no* content here *at all*, so Owen
 has not really published anything. This is not typical of published
 content, even in RDF. Typically, in fact, there is, as well as some
 nontrivial actual RDF content, some kind of explanation, perhaps in
 natural language, of what the *intended* content of the formal RDF is
 supposed to be. While an RDF engine cannot of course make use of such
 intuitive explanations, other authors of RDF can, and should, make
 use of it to try to ensure that they do not make assertions which
 would be counter to the referential intentions of the original
 authors. For example, the Dublin Core URIs were published with almost
 no formal RDF axioms, but quite elaborate natural language glosses
 which

Re: owl:sameAs - Harmful to provenance?

2013-04-04 Thread Phillip Lord
David Booth da...@dbooth.org writes:
 Maybe someone can see a way to avoid this dilemma.  Maybe
 someone can figure out a way to distinguish between the
 essential properties that serve to identify a resource, and
 other inessential properties that the resource might have.
 If so, and the number of essential properties is finite,
 then indeed this problem could be avoided by requiring every
 URI owner to define all of the essential properties of the
 URI's denoted resource, or by prohibiting anyone but the URI
 owner from asserting any new essential properties of the
 resource (beyond those the URI owner had defined).  Or maybe
 there is another way around this dilemma.

 Unless some way around this dilemma is found, it seems
 unreasonably judgemental to accuse Arthur of misusing
 owl:sameAs in this case, since he didn't assert anything
 that was inconsistent with Owen's URI definition.


I think your analysis is good. My solution to avoiding the horns of the
dilemma is to take a different tack entirely, and to think about the
social aspects of how these graphs came to be produced. 

Owen has produced some data. Then Arthur, Aster and Alfred has both
extended it in ways which turn out to be incompatible, and yet they all
seem to be doing things that fulfil their respective use cases. So, in
one sense, there is no problem here at all. In each case, Arthur, Aster
and Alfred get everything to work, and everybody is happy. 

The problem comes when you try to integrate their work; now it breaks.
So, how to avoid this? They are two key ways: the first is to say, well,
okay, so now the graphs break, so lets get together and sort the problem
out. There are lots of ways you could change the graphs here so that the
problem goes away.

The other solution is to argue that if everybody follows a standard
rigidly, then this problem won't happen in the first place. The
difficulty here is not, for example, understanding the set theoretic
interpretation, but how to apply this to what ever it is that you are
trying to model. 

My experience has been that the former has significant costs, that
integrating post-hoc is expensive and time-consuming. However, my
experience of the latter approach, is that it is highly unscalable, and
results in very long and obscure philosophical debates. Essentially,
with the former you pay the cost of integration as you need it; in the
latter you pay the cost of integration all the time, whether you need it
or not. 

So, are the people in your example misuing owl:sameAs? Not if they are
answering the questions they need. Should they fix the problem with
integration? If they need to, to get better answers. But not until then. 


 But by that logic, Arthur would not be able to assert *anything*
 new about :x.  I.e., Arthur would not be allowed to assert
 any property whose value was not already entailed by Owen's
 definition!  And that would render RDF rather pointless.


Absolutely; the whole point of integrating data is that you want to say
things about knowledge that comes from other people. Otherwise, you
don't have integration, you just have a bunch of triples in the same
bucket. 

Phil



Re: Observations about facts in genomics

2013-03-22 Thread Phillip Lord


Yeah, I have heard this argument before. Soon as you give me an
assayable and testable definition for reality, I'm right with you. 

Phil


Jerven Bolleman m...@jerven.eu writes:

 On Thu, Mar 21, 2013 at 2:55 PM, Phillip Lord
 phillip.l...@newcastle.ac.uk wrote:
 This is a broken definition of good to my mind. It suggests that we
 should make all the distinctions that we can make, all the time.
 Unfortunately, this means that everyone bears the cost of the complexity
 all the time also.
 True but the other option is the current situation where we all bear
 the complexity of not knowing what we someone is really talking about.
 Leading to merging of information that should never have been merged
 and conclusions that are not worth the pixels they are displayed on.
 Sure there is a cost to ever more complex representations of
 information to match reality and this is not what I am advocating. I
 am advocating give reality a different IRI than the model.



Re: Observations about facts in genomics

2013-03-22 Thread Phillip Lord


No, they don't. They have a responsibilty to do what they are being paid
to do (or want to achieve for their own purposes) in a rapid and
efficient manner. The point of standards is to make it easier to do this
in the same way as others than to not. People write URLs correctly
because otherwise they don't work, not because they have a
responsibility.




Pat Hayes pha...@ihmc.us writes:
 All citizens have certain responsibilities if they are going to use a global
 interchange format of any kind, which is to find a way to encode their domain
 in that format in a way that conforms to the published rules of the format. Or
 if that is not possible, then at least to publish the ways in which they are
 failing to conform, and to ensure that readers of their data have adequate
 warning of the ways they are failing to conform, and what the consequences
 are. 



Re: Observations about facts in genomics

2013-03-21 Thread Phillip Lord



This is a broken definition of good to my mind. It suggests that we
should make all the distinctions that we can make, all the time.
Unfortunately, this means that everyone bears the cost of the complexity
all the time also. 

A good data model should be an accurate reflection of biology. But it
should also be a convienient model of biology. And the distinction that
you are making is relevant to only a subset of use cases. Make the
distinctions you care about. Let others make the distinctions that they
care about. 

Phil




Jerven Bolleman m...@jerven.eu writes:
 This is fine in RDF, the important thing to separate is the concept of
 a Chromsome/Patient sequence and a set of observations and hypothesis
 about that Chromosome sequence.

 So instead of chromosome M you are really talking about assembly X of
 a set of reads R mapped via some (variant calling) processes to
 reference chromosome C that is also really an assembly of a different
 set of reads. Subtly different and not always made explicit in
 conversation, but for good RDF you representations you should.

 In RDF here you need to be careful about what you are identifying. As
 long as you are correct in what you identified (in this case an
 variant called, mapped assembly) instead of what you are discussing in
 english (a patients chromosome)  you will end up fine. If you do this
 you don't need anything as exotic as frames etc...

 Regards,
 Jerven

 On Wed, Mar 20, 2013 at 9:23 PM, Graham Klyne graham.kl...@zoo.ox.ac.uk 
 wrote:
 Hi Jeremy,


 On 20/03/2013 16:04, Jeremy J Carroll wrote:
 One of the things I am learning about genetic sequencing is this process,
 which is meant to tell you about the patient's DNA, is in fact somewhat
 problematic, resulting in facts which are disputable.


 It gets worse... the association between sequence fragments and genes
 changes over time as knowledge is improved, I understand in ways that isn't
 always reflected in published information.  GMOD/CHADO
 (http://gmod.org/wiki/Introduction_to_Chado) keeps all the concepts very
 separate to allow for this, but the translation to RDF can get very
 convoluted (Al Miles did some work on a mapping, a few years ago).

 I also understand that there's emerging research that shows that non-coding
 regions, which were previously thought to be meaningless/irrelevant, do
 actually have relevant roles in the overall genetic machinery (something to
 do with regulation?).

 One of the many reasons I'd like RDF to have some flexibility to deal with
 contexts, or differing worldviews, is to allow representation of evolving
 information without having to make explicit all those things that
 researchers sometimes don't bother to make explicit (e.g. genes vs proteins,
 sequence vs gene, etc.).  And then there all the stuff we don't yet know to
 make explicit. (frame problem, anyone?)

 #g
 --



 On 20/03/2013 16:04, Jeremy J Carroll wrote:

 Pat Hayes wrote:

 [RDF] is intended for recording data, and most data is pretty mundane
 stuff about which there is not a lot of factual disagreement.

 One of the things I am learning about genetic sequencing is this process,
 which is meant to tell you about the patient's DNA, is in fact somewhat
 problematic, resulting in facts which are disputable.

 So, a data file that I am trying to get my head around at the moment
 contains a line like:

 chrM942 rs28579222  A   G   .   .
 ASP;HD;OTHERKG;RSPOS=942;SAO=0;SF=0;SSR=0;VC=SNV;VP=0505000402000100;WGT=1;dbSNPBuildID=125


 So far, I have understood the first five fields, as saying that in a
 particular position in the DNA (the 942nd base in the mitochondrial DNA, aka
 rs28579222), when one might have expected to see an A a sample had a G.
 But that last part a sample had a G is in fact open to doubt … There is
 a complex piece of chemistry, physics and computing that guesses that there
 is a G in that position. It is possible to see some of the less processed
 data that fed into that guess, and to see levels of confidence that the
 different algorithms had with the results; but it is not a slam dunk by any
 means. So, some more skeptical people want to be able to see the 'raw read
 data' prior to the decision that this is a G. Usually one would expect to
 see some of the raw read data agree with the G, and some disagree.


 Since this assertion (that this position is a G) is made with a few
 million similar assertions, all of which have some element of doubt - it
 would be highly surprising if every single call were correct: yet within the
 logic of RDF we probably end up asserting the truth of the whole graph …
 which leads us onto the dangerous path of ex contradictione quadlibet











-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower

Re: ANN: Semantic University - for learning the Semantic Web (now easier to use)

2012-10-15 Thread Phillip Lord

Referencing and linking are not the same thing; say I want to produce a
table of contents of resources, including yours and others, or an index.
What would you think if you opened a book and at the front it said
pg 8, pg 32, pg 53, where you expected a table of contents. 

If you want to know more, I'd suggest that you read

https://www.cambridgesemantics.com/semantic-university/introduction-to-linked-data

Rule 3 is: 
 When someone looks up a URI, provide useful information, using the
 standards such as RDF* and SPARQL.

Anyway, I think I have said enough here; it's your website!

Phil



Lee Feigenbaum l...@thefigtrees.net writes:
 I don't really find the use cases you suggest particularly compelling. Perhaps
 you could explain them in a bit more detail?

 Searching -- we do some degree of traditional SEO, and the lessons generally
 show up very well on major search engines

 Sorting -- I'm not sure what would be sorted? The lessons are presented in a
 particular order designed to help the understanding of readers who go through
 the material as presented

 Mashing up -- Can you give me an example?

 Referencing -- Generally speaking, we think that the URLs of the individual
 lessons are perfectly adequate for referencing


 thanks,
 Lee

 On 10/11/2012 5:24 AM, Phillip Lord wrote:

 I am a little surprised that you can't see use cases for adding
 computationally extractable metadata to your articles. Searching,
 sorting, mashing up, referencing and so on.

 RSS is a different point; ignoring it's what's new role, it happens to
 be a reasonable source for computational metadata where there is nothing
 else.

 Phil





 Lee Feigenbaum l...@thefigtrees.net writes:
 Thanks for the feedback. We didn't pursue an RSS feed for the site because
 it's intended to be relatively timeless educational content, rather than 
 dated
 material. That said, I can look into adding one.

 Can you help me understand the use cases for using some of the other
 approaches you mention and what would be involved? I didn't really have any
 compelling use cases in mind off the top of my head to mark up these 
 lessons.

 thanks,
 Lee

 On 10/10/2012 7:20 AM, Phillip Lord wrote:
 This is an interesting set of pages.

 One thing that confuses me about this web site is that, as far as I can
 see, it apperas to use no semantic web technology; certainly trying to
 mine the web pages shows no metadata describing what the document is
 about. We tried searching for OGP, various forms of metatags, prism,
 COINs and so on, using our Greycite (http://greycite.knowledgeblog.org)
 tool, and found nothing. We've tried visual inspection as well -- not
 easy as all the HTML is on one line -- and again can see nothing. Tried
 content negotiation for RDF, but this returns HTML. Even the normally
 reliable RSS feed fails because there isn't one.

 Phil



 Lee Feigenbaum l...@thefigtrees.net writes:

 Hi everyone,

 Many of you may already have come across Semantic University
 http://www.cambridgesemantics.com/semantic-university, but I'd like to
 announce it to this community.

 Semantic University is a free, online resource for learning Semantic Web
 technologies. We've gotten some great feedback over the past few months, 
 and
 we feel that it's one of the most accessible ways for both technical and
 non-technical people to start learning about semantics and the Semantic 
 Web.

 For those of you who have seen Semantic University before, we've 
 re-organized
 the content into general Semantic Web Landscape content and into specific
 technical tracks oriented around RDF, OWL/RDFS, SPARQL, and Semantic Web
 Design Patterns. I hope you'll check it out as we think it's now much 
 easier
 to use to learn about the Semantic Web.

 Semantic University currently includes over 30 lessons, and we're 
 continually
 preparing new content. We're also looking for additional writers to 
 contribute
 new lessons, so please contact me if you'd be interested. I'd especially 
 like
 to start including content specific to particular verticals, and HCLS 
 would be
 a great starting place. Please let me know if you'd be interested in
 contributing!

 Current lessons include:

* An Introduction to the Semantic Web
  
 https://www.cambridgesemantics.com/semantic-university/introduction-to-the-semantic-web
* Semantic Web Misconceptions
  
 https://www.cambridgesemantics.com/semantic-university/semantic-web-misconceptions
* Semantic Web vs. Semantic Technologies
  
 https://www.cambridgesemantics.com/semantic-university/semantic-web-vs-semantic-technologies
* RDF 101 
 https://www.cambridgesemantics.com/semantic-university/rdf-101
* SPARQL Nuts and Bolts
  
 https://www.cambridgesemantics.com/semantic-university/sparql-nuts-and-bolts

 ...and many more.

 Please enjoy  we welcome all feedback  suggestions.

 best,
 Lee







-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics

Re: ANN: Semantic University - for learning the Semantic Web (now easier to use)

2012-10-11 Thread Phillip Lord


I am a little surprised that you can't see use cases for adding
computationally extractable metadata to your articles. Searching,
sorting, mashing up, referencing and so on. 

RSS is a different point; ignoring it's what's new role, it happens to
be a reasonable source for computational metadata where there is nothing
else. 

Phil





Lee Feigenbaum l...@thefigtrees.net writes:
 Thanks for the feedback. We didn't pursue an RSS feed for the site because
 it's intended to be relatively timeless educational content, rather than dated
 material. That said, I can look into adding one.

 Can you help me understand the use cases for using some of the other
 approaches you mention and what would be involved? I didn't really have any
 compelling use cases in mind off the top of my head to mark up these lessons.

 thanks,
 Lee

 On 10/10/2012 7:20 AM, Phillip Lord wrote:
 This is an interesting set of pages.

 One thing that confuses me about this web site is that, as far as I can
 see, it apperas to use no semantic web technology; certainly trying to
 mine the web pages shows no metadata describing what the document is
 about. We tried searching for OGP, various forms of metatags, prism,
 COINs and so on, using our Greycite (http://greycite.knowledgeblog.org)
 tool, and found nothing. We've tried visual inspection as well -- not
 easy as all the HTML is on one line -- and again can see nothing. Tried
 content negotiation for RDF, but this returns HTML. Even the normally
 reliable RSS feed fails because there isn't one.

 Phil



 Lee Feigenbaum l...@thefigtrees.net writes:

 Hi everyone,

 Many of you may already have come across Semantic University
 http://www.cambridgesemantics.com/semantic-university, but I'd like to
 announce it to this community.

 Semantic University is a free, online resource for learning Semantic Web
 technologies. We've gotten some great feedback over the past few months, and
 we feel that it's one of the most accessible ways for both technical and
 non-technical people to start learning about semantics and the Semantic Web.

 For those of you who have seen Semantic University before, we've 
 re-organized
 the content into general Semantic Web Landscape content and into specific
 technical tracks oriented around RDF, OWL/RDFS, SPARQL, and Semantic Web
 Design Patterns. I hope you'll check it out as we think it's now much easier
 to use to learn about the Semantic Web.

 Semantic University currently includes over 30 lessons, and we're 
 continually
 preparing new content. We're also looking for additional writers to 
 contribute
 new lessons, so please contact me if you'd be interested. I'd especially 
 like
 to start including content specific to particular verticals, and HCLS would 
 be
 a great starting place. Please let me know if you'd be interested in
 contributing!

 Current lessons include:

   * An Introduction to the Semantic Web
 
 https://www.cambridgesemantics.com/semantic-university/introduction-to-the-semantic-web
   * Semantic Web Misconceptions
 
 https://www.cambridgesemantics.com/semantic-university/semantic-web-misconceptions
   * Semantic Web vs. Semantic Technologies
 
 https://www.cambridgesemantics.com/semantic-university/semantic-web-vs-semantic-technologies
   * RDF 101 https://www.cambridgesemantics.com/semantic-university/rdf-101
   * SPARQL Nuts and Bolts
 
 https://www.cambridgesemantics.com/semantic-university/sparql-nuts-and-bolts

 ...and many more.

 Please enjoy  we welcome all feedback  suggestions.

 best,
 Lee





-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   msn: m...@russet.org.uk
NE1 7RU twitter: phillord



Re: ANN: Semantic University - for learning the Semantic Web (now easier to use)

2012-10-11 Thread Phillip Lord

I didn't ask for semantic web technologies (other than RSS which *might*
be RDF). Adding any computational metadata would, surely, be a good
thing. I do not understand why people deal with the web so unseriously
as a publication media. 

Although, curiously, you do have minimal metatags on your own website;
not for search engines, it would appear, as you've robot.txt'd them
away. Why did you use them if the are meaningless?

Phil


Eric Miller e...@squishymedia.com writes:

 Just chiming in quickly here -- our web dev shop wouldn't automatically
 include any of the semantic web technologies in a standard site template. Like
 Lee, we haven't seen a compelling use case to do anything beyond well-formed
 code. For example, Meta tags haven't been part of our code standards for years
 since the search engines tend to ignore them and they were prone to abuse or
 meaninglessness.

 I acknowledge that the newer semantically aware technologies might benefit
 from a push towards critical mass through community adoption but it is a hard
 sell to justify spending client dollars to implement something just on
 principle.

 A chicken and egg problem I guess. Though I'd welcome any other perspectives
 on this.

 Eric

 Squishymedia Web Development 503.780.1847

 On Oct 11, 2012, at 2:24 AM, phillip.l...@newcastle.ac.uk (Phillip Lord)
 wrote:

 
 
 I am a little surprised that you can't see use cases for adding
 computationally extractable metadata to your articles. Searching, sorting,
 mashing up, referencing and so on.
 
 RSS is a different point; ignoring it's what's new role, it happens to be
 a reasonable source for computational metadata where there is nothing else.
 
 Phil
 
 
 
 
 
 Lee Feigenbaum l...@thefigtrees.net writes:
 Thanks for the feedback. We didn't pursue an RSS feed for the site because
 it's intended to be relatively timeless educational content, rather than
 dated material. That said, I can look into adding one.
 
 Can you help me understand the use cases for using some of the other
 approaches you mention and what would be involved? I didn't really have any
 compelling use cases in mind off the top of my head to mark up these
 lessons.
 
 thanks, Lee
 
 On 10/10/2012 7:20 AM, Phillip Lord wrote:
 This is an interesting set of pages.
 
 One thing that confuses me about this web site is that, as far as I can
 see, it apperas to use no semantic web technology; certainly trying to
 mine the web pages shows no metadata describing what the document is
 about. We tried searching for OGP, various forms of metatags, prism, COINs
 and so on, using our Greycite (http://greycite.knowledgeblog.org) tool,
 and found nothing. We've tried visual inspection as well -- not easy as
 all the HTML is on one line -- and again can see nothing. Tried content
 negotiation for RDF, but this returns HTML. Even the normally reliable RSS
 feed fails because there isn't one.
 
 Phil
 
 
 
 Lee Feigenbaum l...@thefigtrees.net writes:
 
 Hi everyone,
 
 Many of you may already have come across Semantic University
 http://www.cambridgesemantics.com/semantic-university, but I'd like to
 announce it to this community.
 
 Semantic University is a free, online resource for learning Semantic Web
 technologies. We've gotten some great feedback over the past few months,
 and we feel that it's one of the most accessible ways for both technical
 and non-technical people to start learning about semantics and the
 Semantic Web.
 
 For those of you who have seen Semantic University before, we've
 re-organized the content into general Semantic Web Landscape content and
 into specific technical tracks oriented around RDF, OWL/RDFS, SPARQL, and
 Semantic Web Design Patterns. I hope you'll check it out as we think it's
 now much easier to use to learn about the Semantic Web.
 
 Semantic University currently includes over 30 lessons, and we're
 continually preparing new content. We're also looking for additional
 writers to contribute new lessons, so please contact me if you'd be
 interested. I'd especially like to start including content specific to
 particular verticals, and HCLS would be a great starting place. Please
 let me know if you'd be interested in contributing!
 
 Current lessons include:
 
  * An Introduction to the Semantic Web
 https://www.cambridgesemantics.com/semantic-university/introduction-to-the-semantic-web
 * Semantic Web Misconceptions
 https://www.cambridgesemantics.com/semantic-university/semantic-web-misconceptions
 * Semantic Web vs. Semantic Technologies
 https://www.cambridgesemantics.com/semantic-university/semantic-web-vs-semantic-technologies
 * RDF 101
 https://www.cambridgesemantics.com/semantic-university/rdf-101 * SPARQL
 Nuts and Bolts
 https://www.cambridgesemantics.com/semantic-university/sparql-nuts-and-bolts
 
 ...and many more.
 
 Please enjoy  we welcome all feedback  suggestions.
 
 best, Lee
  -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics,
 Email: phillip.l

Re: ANN: Semantic University - for learning the Semantic Web (now easier to use)

2012-10-10 Thread Phillip Lord

This is an interesting set of pages. 

One thing that confuses me about this web site is that, as far as I can
see, it apperas to use no semantic web technology; certainly trying to
mine the web pages shows no metadata describing what the document is
about. We tried searching for OGP, various forms of metatags, prism,
COINs and so on, using our Greycite (http://greycite.knowledgeblog.org)
tool, and found nothing. We've tried visual inspection as well -- not
easy as all the HTML is on one line -- and again can see nothing. Tried
content negotiation for RDF, but this returns HTML. Even the normally
reliable RSS feed fails because there isn't one.

Phil



Lee Feigenbaum l...@thefigtrees.net writes:

 Hi everyone,

 Many of you may already have come across Semantic University
 http://www.cambridgesemantics.com/semantic-university, but I'd like to
 announce it to this community.

 Semantic University is a free, online resource for learning Semantic Web
 technologies. We've gotten some great feedback over the past few months, and
 we feel that it's one of the most accessible ways for both technical and
 non-technical people to start learning about semantics and the Semantic Web.

 For those of you who have seen Semantic University before, we've re-organized
 the content into general Semantic Web Landscape content and into specific
 technical tracks oriented around RDF, OWL/RDFS, SPARQL, and Semantic Web
 Design Patterns. I hope you'll check it out as we think it's now much easier
 to use to learn about the Semantic Web.

 Semantic University currently includes over 30 lessons, and we're continually
 preparing new content. We're also looking for additional writers to contribute
 new lessons, so please contact me if you'd be interested. I'd especially like
 to start including content specific to particular verticals, and HCLS would be
 a great starting place. Please let me know if you'd be interested in
 contributing!

 Current lessons include:

  * An Introduction to the Semantic Web

 https://www.cambridgesemantics.com/semantic-university/introduction-to-the-semantic-web
  * Semantic Web Misconceptions

 https://www.cambridgesemantics.com/semantic-university/semantic-web-misconceptions
  * Semantic Web vs. Semantic Technologies

 https://www.cambridgesemantics.com/semantic-university/semantic-web-vs-semantic-technologies
  * RDF 101 https://www.cambridgesemantics.com/semantic-university/rdf-101
  * SPARQL Nuts and Bolts

 https://www.cambridgesemantics.com/semantic-university/sparql-nuts-and-bolts

 ...and many more.

 Please enjoy  we welcome all feedback  suggestions.

 best,
 Lee


-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   msn: m...@russet.org.uk
NE1 7RU twitter: phillord



Re: HPO and Gene Ontology Licenses

2012-08-10 Thread Phillip Lord
Alan Ruttenberg alanruttenb...@gmail.com writes:

 As you know, we and others have demonstrated that alternative
 representations and reformulation of knowledge is desirable for certain
 kinds of scientific inquiry.

 Sorry, I'm unaware of such demonstration. Could you cite some references?


http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0012258

A few examples of where multiple representations of the same knowledge
have been used for good reasons:

 - multiple syntaxes for RDF
 - multiple syntaxes for OWL
 - two APIs for XML (DOM and SAX). 
 - multiple computer languages which are reducable to lambda calculus
 - lambda calculus and a Turing Machine
 - continued use of Newtonian mechanics, although its an approximation
   of relativistic mechanics
 - multiple statisical techniques for expression of central tendancy
 - PDFs are still better for reading in the bath than HTML

And so on. Any model is a compromise between accuracy, usability,
convenience and so on. Sometimes having more than one compromise is a
better solution than trying to shoe-horn everything into one bucket.
This is a compromise too. 

Phil  



Re: ismb sig call for proposals is online

2009-10-20 Thread Phillip Lord

It somewhat overlaps with bio-ontologies, though. If people are keen to
do this, I'd suggest talking to the bio-ontologies organisers (Nigam is
best point of contact) to see whether something can be done in common. 

Phil

Andrea Splendiani andrea.splendi...@bbsrc.ac.uk writes:
 A SIG on Semantic Web applications in Life Sciences ?
 That would be a good thing.

 I would vote for 1 day, prior to ismb and not overlapping with bio-
 ontologies, and with a call for submission.
 You can list swatls as a related event with attendance (2008) of around 80pp.

 Let me know if I can help.

 ciao,
 Andrea

 P.S.: I'll be at ISWC, and you at OWLed, but I'll arrive on the 23rd...

 On 20 Oct 2009, at 19:09, Joanne Luciano wrote:

 let's propose one for ISMB...

 Begin forwarded message:

 From: Hershel Safer hsa...@alum.mit.edu
 Date: October 20, 2009 1:04:44 PM EDT
 To: Fran Lewitter lewit...@wi.mit.edu, Rafi Najmanovich
 rafael.najmanov...@ebi.ac.uk
 ,  Iddo Friedberg ido...@gmail.com, Eduardo Eyras eduardo.ey...@upf.edu
 , E Eyras eey...@imim.es,  Hagit Shatkay shat...@cs.queensu.ca, Nigam
 Shah ni...@stanford.edu,  Vitor  Martins dos Santos
 v...@helmholtz-hzi.de, Vitor Martins dos Santos  vdsm...@gmail.com,  Kam
 Dahlquist kdahlqu...@lmu.edu, Michael  Brudno bru...@gmail.com,
 Francisco M De La Vega francisco.delav...@appliedbiosystems.com ,  Laura
 Elnitski elnit...@mail.nih.gov, James Taylor james.tay...@emory.edu ,
 Anton Nekrutenko an...@bx.psu.edu, Dawn Field  dfi...@ceh.ac.uk,  Phil
 Lord phillip.l...@newcastle.ac.uk,  Susanna-Assunta Sansone
 sans...@ebi.ac.uk,  Susie Stephens susie.steph...@gmail.com , Larisa
 Soldatova l...@aber.ac.uk,  Jens Stoye st...@techfak.uni-bielefeld.de ,
 Dirk Holste hol...@imp.ac.at,  Lonnie Welch we...@ohio.edu,  Joanne
 Luciano jluci...@genetics.med.harvard.edu,  Christian  Blaschke
 blasc...@bioalma.com, Alfonso Valencia  valen...@cnio.es,  Lynette
 Hirschman lyne...@mitre.org, Scott  Markel smar...@accelrys.com,  James
 Procter  j.proc...@dundee.ac.uk, Sean O'Donoghue
 sean.odonog...@embl.de,  Jean-Christophe Nebel j.ne...@kingston.ac.uk ,
 Steffen Moller moel...@inb.uni-luebeck.de,  Paul Flicek fli...@ebi.ac.uk
 
 Cc: Steven Leard ste...@marketwhys.ca
 Subject: ismb sig call for proposals is online

 Hi everybody,

 I want to let you know that the ISMB SIG call for proposals is now online
 at
 http://www.iscb.org/ismb2010-submission-details/ismb2010-special-interest-groups
 .

 The CFP is largely the same as last year's; I expect that it will be
 updated in a few days to include the names of the committee  members. The
 submission deadline is Dec 1st.

 I look forward to another great collection of SIG meetings! Thanks,
 Hershel

 --
 Hershel Safer
 e: hsa...@alum.mit.edu | m: +972-54-463-1977 | skype: hsafer



 ---
 Andrea Splendiani
 Senior Bioinformatics Scientist
 Rothamsted Research, Harpenden, UK
 andrea.splendi...@bbsrc.ac.uk
 +44(0)1582 763133 ext 2004


-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   msn: m...@russet.org.uk
NE1 7RU



Re: Any meeting at ISMB ?

2009-06-18 Thread Phillip Lord

I'd also be happy to advertise a meeting at bio-ontologies, where they
will be quite a few people with SW interests. 

Phil

Hilmar Lapp hl...@duke.edu writes:
 On Jun 18, 2009, at 12:36 PM, eric neumann wrote:

 I'm pretty sure BOFs are open to ISMB registered folks only. That does
 not prevent us from having additional meet ups.


 That depends on where you hold them ... BTW Alan Ruttenberg will be giving a
 keynote address at BOSC, and as a result some of the BOSC  attendees might be
 interested in a life sciences semweb BOF, so  holding it one of the two days
 of BOSC is maybe worth considering.

   -hilmar
 --
 ===
 : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
 ===








-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   msn: m...@russet.org.uk
NE1 7RU



Bio-Ontologies programme released

2009-06-04 Thread Phillip Lord

** Call for Participation

Bio-Ontologies: Knowledge in Biology provides a forum for discussion of the
latest and most cutting-edge research in ontologies and more generally the
organisation, presentation and dissemination of knowledge in biology.

We are pleased to announce the programme for Bio-Ontologies 2009: Knowledge in
Biology, at SIG at Intelligent Systems for Molecular Biology, with papers and
posters on a wide range of topics.


** Programme

This year's keynote speaker will by Barend Mons
(http://en.wikipedia.org/wiki/Barend_Mons).


This year's panel will be

 - Barend Mons
 - Andrew Su
 - Dawn Field

The full programme is now available at:

http://bio-ontologies.org.uk/programme.html

*** Organisers

Phillip Lord, Newcastle University
Susanna-Assunta Sansone, EBI
Nigam Shah, Stanford
Susie Stephens, Eli Lilly
Larisa Soldatova, University of Wales, Aberystwyth



** Registration

All registration will be handled by ISCB.

Further details are available

http://www.iscb.org/ismbeccb2009/registration.php



** Bio-Ontologies 2008

Papers from last years SIG have now been published in BMC Bioinformatics.

http://www.biomedcentral.com/1471-2105/10?issue=S5




-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower,   skype: russet_apples
Newcastle University,   msn: m...@russet.org.uk
NE1 7RU



CFP: Bio-Ontologies

2009-04-07 Thread Phillip Lord


***DEADLINE FRIDAY FOR SUBMISSIONS***


** Call for Papers

Submissions are now invited Bio-Ontologies 2009: Knowledge in Biology, a SIG
at Intelligent Systems for Molecular Biology 2009.


*** Key Dates

 - Submissions Due: April 10th (Friday)
 - Notifications: May 1st (Friday)
 - Final Version Due: May 8th (Friday)
 - Workshop: June 28th (Sunday)


*** Introduction

Bio-Ontologies: Knowledge in Biology provides a forum for discussion of the
latest and most cutting-edge research in ontologies and more generally the
organisation, presentation and dissemination of knowledge in biology. It has
existed as a SIG at ISMB (http://www.iscb.org/ismbeccb2009) for 11 years now,
making it one of the longest running.

We are interested in any formal or informal approach to organising, presenting
and disseminating knowledge in biology.

We invite submissions on a wide range of topics including, but not limited to:

 - Semantic and/or Scientific Wikis.
 - Multimedia Blogs
 - Folksonomies
 - Tag Clouds
 - Collaborative Curation Platforms
 - Collaborative Ontology Authoring and Peer-Review Mechanisms
 - Biological Applications of Ontologies
 - Reports on Newly Developed or Existing Bio-Ontologies
 - Tools for Developing Ontologies
 - Use of Ontologies in Data Communication Standards
 - Use of Semantic Web technologies in Bioinformatics
 - Implications of Bio-Ontologies or the Semantic Web for Drug Discovery
 - Research in Ontology Languages and its Effect on Bio-Ontologies



** Programme

This year's keynote speaker will by Barend Mons
(http://en.wikipedia.org/wiki/Barend_Mons).


This year's panel will be

 - Barend Mons
 - Andrew Su
 - Dawn Field

*** Submissions

Submissions are now open and can be submitted through easychair
(http://www.easychair.org/conferences/?conf=bioontologies2009).


*** Instructions to Authors

We are inviting two types of submissions.

 - Short papers, up to 4 pages.
 - Poster abstracts, up to 1/2 page.

Following review, successful papers will be presented at the Bio-Ontologies
SIG. Poster abstracts will be provided poster space and time will be allocated
during the day for at least one poster session.

Unsuccessful papers will automatically be considered for poster presentation;
there is no need to submit both on the same topic.





*** Organisers

Phillip Lord, Newcastle University
Susanna-Assunta Sansone, EBI
Nigam Shah, Stanford
Susie Stephens, Eli Lilly
Larisa Soldatova, University of Wales, Aberystwyth



*** Programme Committee

The programme committee, organised alphabetically is:

Michael Bada,  University of Colorado Denver
Olivier Bodenreider,   National Library Medicine
Kei Cheung,Yale Center for Medical Informatics
Paolo Ciccarese,   Harvard
Sudeshna Das,  Harvard
Michel Dumontier,  Carleton University
Wacek Kusnierczyk, Norwegian University of Science and Technology
Cliff Joslyn,  Pacific Northwest National Laboratory
Midori Harris, European Bioinformatics Institute
James Malone,  European Bioinformatics Institute
Robin McEntire,Independent Consultant
Parsa Mirhaji, University of Texas
David Newman,  ECS, University of Southampton
Chimezie Ogbuji,   The Cleveland Clinic Foundation
Alexandre Passant, DERI
Alan Ruttenberg,   Science Commons
Phillippe Rocca-Serra, European Bioinformatics Institute
Matthias Samwald,  DERI
Robert Stevens,University of Manchester
Yimin Wang,Eli Lilly
Mark Wilkinson,Medical Genetics, U. of British Columbia
Jenna Zhou,Eli Lilly




and the conference organisers.


*** Templates

Submission templates are available from the website
(http://bio-ontologies.org.uk).














Re: Is OWL useful at all for Quantitative Science?

2009-03-31 Thread Phillip Lord
Matthias Samwald samw...@gmx.at writes:
  I have tried to come up with a simple example. Feel free to come up
 with a simpler one:

  Express in correct OWL: Washington DC is further away from Boston
 than New York City

  Use case: I want to fly with my helicopter from Boston to either DC
 or NYC, whichever is closer.

 Why should this be hard? If I take your example by word and I am free to come
 up with arbitrary OWL DL, we could simply use an n-ary design pattern to SAY
 it in OWL. E.g., create a class is farther away than, with three properties
 reference place, nearer place, place that is farther away -- and create
 an instance accordingly. Problem solved.

I think I would say problem represented rather than solved. This is a
good thing, of course, and does represent a way in which you can do
quantitative science where OWL is part of the solution. 

Katy Wolstencroft did the same thing in this paper

doi:10.1093/bioinformatics/btl208 

In this case, the results of a quantitative analysis (how much
similarity) were translated into statements in OWL, then we applied
reasoning over the top. 

At Newcastle, Keith Flanagan did a similar thing looking for genomic
rearrangements, which you can read about here:

http://homepages.cs.ncl.ac.uk/phillip.lord/download/publications/iswc2005_bayesian_poster.pdf

In this case, we were using OWL to help plug resources into a Bayesian
stats engine, and later interpret the results. 

Bottom line with all of these, is that can interact between OWL and
quantitative results. In each of these cases, the degree of interaction
between the numerical reasoning and logical reasoning was through a
fairly thin pipe. 

 But I guess what we really would want to do is to describe each city with
 geo-tags (latitude and longitude). Then we can use SPARQL to query for cities
 and calculate their distance from Boston.

Didn't know you could do that; is the mathematics integrated into SPARQL
or are you doing some kind of call out. The reason that I ask is that
ultimately, your question is which is closer as the helicopter flies
which you can only really get from a database, what with nofly zones and
the like.

Phil



Re: blog: semantic dissonance in uniprot

2009-03-26 Thread Phillip Lord
Oliver Ruebenacker cur...@gmail.com writes:
   Besides, how do we know it's wrong? Two species can have the same
 protein for different functions, right?


Depends how you define same. This is the crux of the problem. 

Phil



Re: Less strong equivalences

2009-03-26 Thread Phillip Lord
Pat Hayes pha...@ihmc.us writes:
 From your descriptions, I can't tell which one would best handle the
 following situation:

 Object 1 refers to exactly the same molecule (exemplar) as object 2 refers
 to

 That sure sounds like sameAs, applied to molecules. Why isn't sameAs good
 enough here? What goes wrong?

I can think of very few occasions when we want to talk about a molecule;
we need to talk about classes of molecules. We can consider this as
problematic even with a very simple example. 

Let's assume we have two databases with information about Carbon. Do we
use sameAs to describe the atoms that they are talking about? Maybe,
but what happens if one is talking about the structure of Carbon and
it's location in the periodic table, while the other is talking about
Carbon with the isotopic mix that we have in living organisms on earth?

In biology, we have the same problem. Is porcine insulin the same as
human insulin? Is real human insulin the same as recombinant
human insulin? Well, the answer to all of these is no, even though most
biologists will tell you that real and recombinant insulin are the same
because they have the same primary sequence; a medic will tell you
otherwise, because they have different effects. Why? Don't know. 

If you make the distinctions that you might need some of the time, all
of the time, then you are going to end up with a very complicated model.
Hence the evolutionary biologist says all the insulins are the same. The
medic says that they are different. And neither of them care about
different types of carbon (unless they are C14-dating). 

I don't think that there is a generic solution here which is not too
complicated to use. The only solution (which is too complicated) I can
think of is to do what we do when we have this problem in programming;
you use a pluggable notion of equality, by using some sort of comparitor
function or object. I don't think that this is an issue for OWL myself;
I think it's something we may need to build on top of OWL. 

Phil




Re: blog: semantic dissonance in uniprot

2009-03-26 Thread Phillip Lord
Oliver Ruebenacker cur...@gmail.com writes:
  Hello Philip, All,

 On Wed, Mar 25, 2009 at 1:05 PM, Phillip Lord
 phillip.l...@newcastle.ac.uk wrote:
 My own feeling is that it's biology which wove the web; we're just
 caught in the middle. What role for the web and semantics? Well, I think
 we need a coordinated, controlled and defined way of expressing our
 mutual confusion. I'd love to have a clear definition of gene (or
 protein). In it's absence, a good way of expressing err... is probably
 the best we can do.

   I don't know whether the BioPAX Level 2 definition of protein is the
 most useful one, but at least it sounds clear to me:

   protein = anything containing exactly one polypeptide chain

   Clear enough?


So insulin is not a protein, wheras a dipeptide is?

Besides which, the issue being discussed here is one of equality. When
are two proteins the same protein? 

Phil




Re: blog: semantic dissonance in uniprot

2009-03-26 Thread Phillip Lord
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no writes:
   I don't know whether the BioPAX Level 2 definition of protein is the
 most useful one, but at least it sounds clear to me:

   protein = anything containing exactly one polypeptide chain

   Clear enough?
 


 So insulin is not a protein, wheras a dipeptide is?
   

 indeed;  insulin is a protein complex, and a dipeptide, following this
 and other similar definitions, is a protein.


Insulin is two polypeptide changes so following this definition is not a
protein. 

Phil



Re: Less strong equivalences

2009-03-26 Thread Phillip Lord
Pat Hayes pha...@ihmc.us writes:
 We can consider this as
 problematic even with a very simple example.

 Let's assume we have two databases with information about Carbon.

 meaning, I presume, the element with atomic number 14.

I was thinking of the carbon with atomic number 6. 



 Maybe,
 but what happens if one is talking about the structure of Carbon and
 it's location in the periodic table, while the other is talking about
 Carbon with the isotopic mix that we have in living organisms on earth?

 So what? They can be saying different things about the same element. Any
 isotopic mix of carbon is still carbon. 

Different isotopic mixes have different properties. Atomic masses,
melting points and so on. 


 In biology, we have the same problem. Is porcine insulin the same as
 human insulin? Is real human insulin the same as recombinant
 human insulin? Well, the answer to all of these is no

 Fine, you just answered the basic ontological question.

 , even though most
 biologists will tell you that real and recombinant insulin are the same
 because they have the same primary sequence; a medic will tell you
 otherwise, because they have different effects. Why? Don't know.

 A deep question, but not a killer for ontology use.


It's not a deep question, just one to which we don't have an answer. 



 If you make the distinctions that you might need some of the time, all
 of the time, then you are going to end up with a very complicated model.

 Yes, you no doubt are. Tough. Its a complicated world.

Yes. And on of those complications is that we have to engineer for
usability as well as accuracy. 



 Formal ontologies are
 often, perhaps always, more complicated than the  informal 'knowledge' they
 set out to formalize. They are obliged to  make finer, more persnickety,
 distinctions between things.

 Hence the evolutionary biologist says all the insulins are the same.

 I don't care what the anyone says, that is wrong. They are indistinguishable
 for certain purposes, but if anyone can distinguish  them at all, they are not
 the _same_.

I think that position is defensible, but unusable. 


 All these examples can be handled by making fussy distinctions between kinds
 of thing at different granularities: carbon molecules, carbon  isotopes,
 carbon the element; and then having mappings between them.   I don't know much
 about insulin, but it sounds from the above that the  same trick would work.
 It is tedious and hair-splitting to set this  up, but once in place its fairly
 easy to use: you just choose the  terminology corresponding to the 'level' you
 wish to be talk ing  about. sameAs works OK at each level, but you can't be
 careless in  using it across levels.

 If this makes you want to groan, I'm sorry. But ontology engineering is rather
 like programming. 

Actually, I quite like programming. I also know how to split things out
in the way you describe. 


 It requires an unusual attention to detail and a willingness to write
 a lot of boring stuff, because its for computers to use, and they are
 as dumb as dirt and have to have every little thing explained to them
 carefully. And yup, its complicated. Until AI succeeds, it will always
 be complicated.

I'd quite enjoy it if you could patronise me a little more please. 


 The only solution (which is too complicated) I can
 think of is to do what we do when we have this problem in programming;
 you use a pluggable notion of equality, by using some sort of comparitor
 function or object. I don't think that this is an issue for OWL myself;
 I think it's something we may need to build on top of OWL.

 It belongs in your ontology for carbon and insulin, not in OWL.

Is that not what my last sentance says? 

Phil



Re: blog: semantic dissonance in uniprot

2009-03-26 Thread Phillip Lord
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no writes:
 So insulin is not a protein, wheras a dipeptide is?
   
   
 indeed;  insulin is a protein complex, and a dipeptide, following this
 and other similar definitions, is a protein.

 

 Insulin is two polypeptide changes so following this definition is not a
 protein. 
   

 that's what i was saying:  that it is a protein complex, specifically,
 an aggregate of two polypeptide chains. 

My apologies, Wacek. I misread your reply. I thought you were disaggreing
with my interpretation of the definition, which you were not. 

 it may sound revolutionary to you that insulin is not a protein, since
 insulin is typically called a 'protein'.  but, provided one accepts a
 definition like the one above, there is nothing wrong in saying that
 insulin is not a protein.


I agree with all of this. As it happens, I would say that insulin is a
protein (and not a complex) because it's disulphide bonded; so it's a
single molecule, but has two polypeptide chains. I'd tweak the
definition. But as you say, if we all agreed on the definition, then
either way, the end result would be clear. 

Phil



Re: blog: semantic dissonance in uniprot

2009-03-25 Thread Phillip Lord

Michel_Dumontier michel_dumont...@carleton.ca writes:
 And I'm trying to explain that there is no pragmatic reason to make
 explicit the distinction between a biomolecule (and what we know about
 it) and a database record (and what we know about the biomolecule)
 unless they are actually different.  It just complicates things in a
 wholly unnecessary way. 


I've given a clear example. Where two databases exist, with two records,
which appear to be referring to the same (class of) molecules. 

The problem remains, however, that we have no clear and unambiguous way
of defining what we by the same molecule. So, we refer to a database
which brings along with it an (often ad hoc) definition of what the
same molecule means.

We could, of course, produce a resource which gives identifiers to, say,
all the classes of proteins in the world. But this would not solve the
problem; it would just introduce yet another resource and another
methodology for defining what we mean by an individual protein. 

If I remember correctly the original post that started this of Ben has
it about right. We need some tags which say these two database records
are about the same protein, well, sort of, at least in this case, for
the purposes of what I am doing.

This argument reminds me of when the genome sequences were being
completed and people were arguing about how many genes there are in
humans. Different groups had different pipelines and came up with
different answers; ultimately, you had to conclude that there were all
pretty close and that working out which was best was nearly impossible
in the absence of an exact answer, an exact definition of a gene. We
don't have one; let's get over it and deal with this as is. 

Phil



Re: blog: semantic dissonance in uniprot

2009-03-24 Thread Phillip Lord


Oliver Ruebenacker cur...@gmail.com writes:
 2009/3/23 Michel_Dumontier michel_dumont...@carleton.ca:
 I do not think this would be a wise simplification.  This is only a
 simplification from one perspective: because it avoids having to mint
 and maintain pairs of URIs instead of a single URI.  But the downstream
 cost is that it creates an ambiguity (or URI collision)
 http://www.w3.org/TR/webarch/#URI-collision
 that may cause trouble and be difficult to untangle later as the data is
 used in more and more ways.  For example, if any of the same predicates
 need to be used on both the record and the molecular entity, they will
 become hopelessly confused.  Also, if disjointness assertions are
 included then this overloading may cause logical contraditions.

   Can any one name a real world example of where confusion between an
 entity and its record was issue?


Yes, sure. All proteins have a Uniprot ID (conflating protein and
uniprot records). Then we integrate this with drugbank; this represents
many things including proteins which are not in Uniprot, or represents
several proteins where Uniprot has one. Consider insulin for instance.
We now have a problem because not all proteins have a Uniprot ID. 

The flip side is that if you always say

Protein Record -- contains knowledge about -- protein

it's much more complicated. You are making your data model more
difficult to work with all of the time, to cope with edge cases which
occur only some of the time. 

There's no way around this; either way it's a compromise and what is
good in one context may not be good in another. 

Phil



Re: blog: semantic dissonance in uniprot

2009-03-24 Thread Phillip Lord


Oliver Ruebenacker cur...@gmail.com writes:
   Is it possible that referring to records instead of things is not
 the result of confusion, but rather of cost-benefit considerations -
 that records are cheap and identification is costly and open-ended?
 What is it that can not be achieved by having better records instead?


I never said confusing, I said conflating. I also gave a reason why this
conflation was, at times, a good thing. 


   And what does it take to identify something? We may have thought we
 know what a couch is, until we realize that we have no consensus over
 whether the pillows are part of the couch or not, and that it would be
 more accurate to distinguish between bare couches (without pillows)
 and fully featured couches (with pillows). How far are we going to go?

I don't know, but I do think that we have a reasonable handle on the
engineering decisions that we need to make; the problem is that these
are application dependent. This is a problem if you want to integrate
data for a purpose that is was not originally intended.

My own feeling is taht identifying the underlying biology is
attractive, but not that plausible, because we have no good way of
understanding identity at this level without a record. So, when talking
about a proteins, how do we know when we have one and when we have two?
It's not obvious. But uniprot have a mechanism for making this
judgement. 

So when refereing to a uniprot record we mostly mean the record, and
the extensional set of proteins that are defined by it. 

Phil




CFP: Bio-Ontologies: Knowledge in Biology 2009

2009-03-24 Thread Phillip Lord


** Call for Papers

Submissions are now invited Bio-Ontologies 2009: Knowledge in Biology, a SIG
at Intelligent Systems for Molecular Biology 2009.


*** Key Dates

 - Submissions Due: April 10th (Friday)
 - Notifications: May 1st (Friday)
 - Final Version Due: May 8th (Friday)
 - Workshop: June 28th (Sunday)


*** Introduction

Bio-Ontologies: Knowledge in Biology provides a forum for discussion of the
latest and most cutting-edge research in ontologies and more generally the
organisation, presentation and dissemination of knowledge in biology. It has
existed as a SIG at ISMB (http://www.iscb.org/ismbeccb2009) for 11 years now,
making it one of the longest running.

We are interested in any formal or informal approach to organising, presenting
and disseminating knowledge in biology.

We invite submissions on a wide range of topics including, but not limited to:

 - Semantic and/or Scientific Wikis.
 - Multimedia Blogs
 - Folksonomies
 - Tag Clouds
 - Collaborative Curation Platforms
 - Collaborative Ontology Authoring and Peer-Review Mechanisms
 - Biological Applications of Ontologies
 - Reports on Newly Developed or Existing Bio-Ontologies
 - Tools for Developing Ontologies
 - Use of Ontologies in Data Communication Standards
 - Use of Semantic Web technologies in Bioinformatics
 - Implications of Bio-Ontologies or the Semantic Web for Drug Discovery
 - Research in Ontology Languages and its Effect on Bio-Ontologies



** Programme

This year's keynote speaker will by Barend Mons
(http://en.wikipedia.org/wiki/Barend_Mons).

*** Submissions

Submissions are now open and can be submitted through easychair
(http://www.easychair.org/conferences/?conf=bioontologies2009).


*** Instructions to Authors

We are inviting two types of submissions.

 - Short papers, up to 4 pages.
 - Poster abstracts, up to 1/2 page.

Following review, successful papers will be presented at the Bio-Ontologies
SIG. Poster abstracts will be provided poster space and time will be allocated
during the day for at least one poster session.

Unsuccessful papers will automatically be considered for poster presentation;
there is no need to submit both on the same topic.





*** Organisers

Phillip Lord, Newcastle University
Susanna-Assunta Sansone, EBI
Nigam Shah, Stanford
Susie Stephens, Eli Lilly
Larisa Soldatova, University of Wales, Aberystwyth



*** Programme Committee

The programme committee, organised alphabetically is:

Michael Bada,  University of Colorado Denver
Olivier Bodenreider,   National Library Medicine
Kei Cheung,Yale Center for Medical Informatics
Paolo Ciccarese,   Harvard
Sudeshna Das,  Harvard
Michel Dumontier,  Carleton University
Wacek Kusnierczyk, Norwegian University of Science and Technology
Cliff Joslyn,  Pacific Northwest National Laboratory
Midori Harris, European Bioinformatics Institute
James Malone,  European Bioinformatics Institute
Robin McEntire,Independent Consultant
Parsa Mirhaji, University of Texas
David Newman,  ECS, University of Southampton
Chimezie Ogbuji,   The Cleveland Clinic Foundation
Alexandre Passant, DERI
Alan Ruttenberg,   Science Commons
Phillippe Rocca-Serra, European Bioinformatics Institute
Matthias Samwald,  DERI
Robert Stevens,University of Manchester
Yimin Wang,Eli Lilly
Mark Wilkinson,Medical Genetics, U. of British Columbia
Jenna Zhou,Eli Lilly




and the conference organisers.


*** Templates

Submission templates are available from the website
(http://bio-ontologies.org.uk).








CFP Bio-Ontologies

2009-02-25 Thread Phillip Lord



** Call for Papers

Submissions are now invited Bio-Ontologies 2009: Knowledge in Biology, a SIG
at Intelligent Systems for Molecular Biology 2009.


*** Key Dates

 - Submissions Due: April 10th (Friday)
 - Notifications: May 1st (Friday)
 - Final Version Due: May 8th (Friday)
 - Workshop: June 28th (Sunday)


*** Introduction

Bio-Ontologies: Knowledge in Biology provides a forum for discussion of the
latest and most cutting-edge research in ontologies and more generally the
organisation, presentation and dissemination of knowledge in biology. It has
existed as a SIG at ISMB (http://www.iscb.org/ismbeccb2009) for 11 years now,
making it one of the longest running.

We are interested in any formal or informal approach to organising, presenting
and disseminating knowledge in biology.

We invite submissions on a wide range of topics including, but not limited to:

 - Semantic and/or Scientific Wikis.
 - Multimedia Blogs
 - Folksonomies
 - Tag Clouds
 - Collaborative Curation Platforms
 - Collaborative Ontology Authoring and Peer-Review Mechanisms
 - Biological Applications of Ontologies
 - Reports on Newly Developed or Existing Bio-Ontologies
 - Tools for Developing Ontologies
 - Use of Ontologies in Data Communication Standards
 - Use of Semantic Web technologies in Bioinformatics
 - Implications of Bio-Ontologies or the Semantic Web for Drug Discovery
 - Research in Ontology Languages and its Effect on Bio-Ontologies



** Programme

This year's keynote speaker will by Barend Mons
(http://en.wikipedia.org/wiki/Barend_Mons).

*** Submissions

Submissions are now open and can be submitted through easychair
(http://www.easychair.org/conferences/?conf=bioontologies2009).


*** Instructions to Authors

We are inviting two types of submissions.

 - Short papers, up to 4 pages.
 - Poster abstracts, up to 1/2 page.

Following review, successful papers will be presented at the Bio-Ontologies
SIG. Poster abstracts will be provided poster space and time will be allocated
during the day for at least one poster session.

Unsuccessful papers will automatically be considered for poster presentation;
there is no need to submit both on the same topic.





*** Organisers

Phillip Lord, Newcastle University
Susanna-Assunta Sansone, EBI
Nigam Shah, Stanford
Susie Stephens, Eli Lilly
Larisa Soldatova, University of Wales, Aberystwyth



*** Programme Committee

The programme committee, organised alphabetically is:

Michael Bada,  University of Colorado Denver
Olivier Bodenreider,   National Library Medicine
Kei Cheung,Yale Center for Medical Informatics
Paolo Ciccarese,   Harvard
Sudeshna Das,  Harvard
Michel Dumontier,  Carleton University
Wacek Kusnierczyk, Norwegian University of Science and Technology
Cliff Joslyn,  Pacific Northwest National Laboratory
Midori Harris, European Bioinformatics Institute
James Malone,  European Bioinformatics Institute
Robin McEntire,Independent Consultant
Parsa Mirhaji, University of Texas
David Newman,  ECS, University of Southampton
Chimezie Ogbuji,   The Cleveland Clinic Foundation
Alan Passant,  DERI
Alan Ruttenberg,   Science Commons
Phillippe Rocca-Serra, European Bioinformatics Institute
Matthias Samwald,  DERI
Robert Stevens,University of Manchester
Yimin Wang,Eli Lilly
Mark Wilkinson,Medical Genetics, U. of British Columbia
Jenna Zhou,Eli Lilly




and the conference organisers.


*** Templates

Submission templates are available from the website
(http://bio-ontologies.org.uk).














Re: Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges

2008-08-28 Thread Phillip Lord



Carole

I don't confuse the concepts, although I sometimes get the names mixed up. 

In this case, uploading a workflow (taverna or otherwise) is not going to
guarantee either. I would not expect the workflow that you gave me last year
would necessarily either run now, nor give me the same results for the same
input. 

Of course, this is true in general for any computational artifact; in the case
of something like Java (with it's forwardly compatibility) if it doesn't,
then this defined to be a bug. In the case of other languages. In the case of
workflows, I guess, we have to take the W3C line on 404 and say it's a feature
not a bug.

Not that this means that I think that submissions of workflows is a bad idea.
I just think that they are going to be affected by the ravages of time even
more quickly than raw data is. 

Phil


 Carole == Carole Goble [EMAIL PROTECTED] writes:

  Carole Phil

  Carole yes - do not confuse Reproducibility with Repeatability or
  Carole Reusability

  Carole Carole

  Carole Carole Goble University of Manchester. UK
   KC == Kei Cheung [EMAIL PROTECTED] writes:
   
   
  KC Peter Ansell wrote:
Wiki's explicitly allow for a permanent link to a particular version
of something. Hopefully an implementation of a wiki-like workflow
editor online, will have similar characteristics so that you can still
use a particular version to reproduce a past result if you need to,
provided the web services still exist and haven't changed their
interface ;-) It would also be nice to be able to get corrected
versions via the wiki mechanism though and that would suit the Web 2.0
way, as opposed to publications to which corrections are hard to make.
KC If some journals are requiring raw data (e.g.,
   microarray data) to be
  KC submitted to a public data repository, I wonder if workflows that are
  KC used to analyze the data should also be submitted to a public workflow
  KC repository.
   
   
   
   It's a nice idea but doesn't quite allow the same level of repeatability.
   Most taverna workflows need updating periodically, as the services go
   offline or change their interfaces. Even if they don't, they return
   different results as the implementation changes.
   
   Ultimately, you need to store more than the workflow to allow any degree
   of repeatability. Still, it would be a good step forward which is no bad
   thing.
   
   Phil
   
   
   




-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED]
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909,   skype: russet_apples
Newcastle University,   msn: [EMAIL PROTECTED]
NE1 7RU



Re: Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges

2008-08-28 Thread Phillip Lord

 Carole == Carole Goble [EMAIL PROTECTED] writes:

  Carole Phil

  Carole er which bit of I agree with you don't you get? :-) :-)

  Carole I agree with you! 

The bit after it:-)

We are in furious agreement anyway, which is the main thing. 

Phil



Re: Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges

2008-08-21 Thread Phillip Lord

 KC == Kei Cheung [EMAIL PROTECTED] writes:

  KC Peter Ansell wrote:
   Wiki's explicitly allow for a permanent link to a particular version of
   something. Hopefully an implementation of a wiki-like workflow editor
   online, will have similar characteristics so that you can still use a
   particular version to reproduce a past result if you need to, provided
   the web services still exist and haven't changed their interface ;-) It
   would also be nice to be able to get corrected versions via the wiki
   mechanism though and that would suit the Web 2.0 way, as opposed to
   publications to which corrections are hard to make.
   
   
   
  KC If some journals are requiring raw data (e.g., microarray data) to be
  KC submitted to a public data repository, I wonder if workflows that are
  KC used to analyze the data should also be submitted to a public workflow
  KC repository.



It's a nice idea but doesn't quite allow the same level of repeatability. Most
taverna workflows need updating periodically, as the services go offline or
change their interfaces. Even if they don't, they return different results as
the implementation changes. 

Ultimately, you need to store more than the workflow to allow any degree of
repeatability. Still, it would be a good step forward which is no bad thing. 

Phil




Re: The W3C mailing lists will be limited to interest group participants.

2008-06-25 Thread Phillip Lord

 MS == Matthias Samwald [EMAIL PROTECTED] writes:

  MS Jonathan wrote:
   The W3C mailing lists will be limited to interest group
   participants.
   
   You mean public-semweb-lifesci@w3.org, for example?

  MS According to the last conference call, this might also apply to
  MS this mailing list. How many people are subscribed to this mailing
  MS list at the moment, and how many of these will be 'kicked out'
  MS when the membership policy is enforced? 

It also depends on whether limited means reading or posting or both.
I've been a highly active lurker (erm...) on this list for years and
find it very useful for this. 

I wouldn't read it through public archives. Email or bust. 

Phil



Re: [semweb-lifesci]

2007-11-14 Thread Phillip Lord

 TG == Ted Guild [EMAIL PROTECTED] writes:

  TG I am surprised and sorry that anyone found the page [1] explaining why
  TG we run our lists, and for that matter most of our infrastructure,
  TG according to standards offensive. That was certainly not the intent of
  TG the page, merely to give a thorough response to a request that has come
  TG up a few times and also provide some pointers.  We are a standards body
  TG and feel strongly about promoting and adhering to standards in addition
  TG to creating them but we do not try to be condescending in doing so. 



Let's be clear. This is NOT an standards issue. As far as I can tell both RFCs
mentioned tell you WHAT to do. This is good. They do not tell you what you
should not do; I see no mention of subject lines in either. 



  TG Also I thought it would be helpful to start a Wiki [2] on configuring
  TG filtering for various mail clients, as most are capable of filtering on
  TG List-Id and other headers. 


And here is the problem. You assume that you know how I or others use subject
line tags. Actually, I don't filter on them at all; I use to or cc
addresses, as this captures emails cc'd to me personally, which your List-Id
technique fails for. 

I use them because I filter several mailing lists into one folder and like the
visual cue this gives. I identify semweb-lifesci mails by the lack of a tag,
which works because it's the only list in that folder that doesn't. So, you
see, on this basis ALL of the points on your subject-tagging page are, well,
either irrelevant or wrong.

If you are going to provide a service, then listen to your users. If you are
not going to listen to your users, then don't provide a service. There are
others who can do it better.

Sorry for sounding so irritable on this; it's a bad time of year for
me. Normally, I'd take this with more of a sense of humour, or a slightly
raised eyebrow. 

Cheers

Phil











Re: [semweb-lifesci]

2007-11-13 Thread Phillip Lord

 PH == Pat Hayes [EMAIL PROTECTED] writes:

   Our Systems Team has fielded this request many times.

  PH Its about time it bloody well listened, then.

Yes. 

  PH By the way, the tone of the document [1] is extremely annoying. If the
  PH W3C were a company taking this attitude, it would have lost its customer
  PH base years ago. Of course, the W3C isn't a company: but y'all might give
  PH some thought to the fact the great bulk of the W3C's work is done by
  PH volunteers, who are the people getting screwed over by the Systems
  PH Team's almost palpable arrogance.


The document suggests that if W3C sticks with it's silly policy, then perhaps
mail client developers will fix their clients.

I think that the opposite is also true; if W3C is incapable of producing an
mailing list which can be configured to their owners' wishes, rather than
W3C's own dogma, we should perhaps move the mailing list elsewhere. I would
rather see the effort invested in getting W3C to fix their broken policy than
use workarounds which give them no incentive. 

Incidentally, I don't filter on subject line. 

Phil



Re: identifier to use

2007-08-24 Thread Phillip Lord

 EJ == Eric Jain [EMAIL PROTECTED] writes:

  EJ Phillip Lord wrote:
   I don't understand the desire to implement everything using HTTP.

  EJ Likewise, I don't understand the desire to implement everything using
  EJ anything but HTTP :-) If there is an existing system that is
  EJ (incredibly) widely adopted and that can be built upon, surely that's
  EJ the way to go?

Actually, LSIDs are built on top of HTTP. The initial step is web service and
http delivered. The second stage is multi-protocol which includes HTTP. 


   Why call lots of things, which are actually several protocols by a name
   which suggests that they are all one. How to distinguish between an HTTP
   URI which allows you to do location independent, two step resolution and
   one which doesn't. Well, one solution would be, perhaps, to call it
   something different, say, perhaps, LSID?

  EJ You could have the concept of LS HTTP URIs that follow certain
  EJ conventions, may be useful for some, but I don't quite see the problem
  EJ with the fact that you will be able to resolve some HTTP URIs, but not
  EJ others: The only way to know whether a URI can be resolved or not, in
  EJ the end, is to try; some systems just seem to make doing so harder...


The other way to know whether a URI can be resolved is to use a different name
for those which are not mean to be resolved. 

To me it makes no sense to layer multi different protocols over a single
identifier. Imagine I get an URI like http://uniprot.org/P4543, it could
be

1) a meaningless concept identifier in an ontology
2) a URL which resolves to a pretty web page, via a single step process
3) a URL which always resolve to the same data 
4) A URL which resolves to the current version of some spec like the W3C
   recommendation pages. 
5) A URL which is meant to be considered to be a location independent ID. 
6) What ever else we have decided to layer onto the same identifier scheme.

To me, it doesn't make any sense. 

Phil



Re: identifier to use

2007-08-24 Thread Phillip Lord

 DS == Booth, David (HP Software - Boston) [EMAIL PROTECTED] 
 writes:

   From: Phillip Lord [ . . . ] I don't understand the desire to implement
   everything using HTTP. Why call lots of things, which are actually
   several protocols by a name which suggests that they are all one. How to
   distinguish between an HTTP URI which allows you to do location
   independent, two step resolution and one which doesn't. Well, one
   solution would be, perhaps, to call it something different, say, perhaps,
   LSID?

  DS But that's like asking Why call everything URNs?. 

No it isn't. http:// based URIs carry the assumption that they are potentially
resolvable by a defined protocol. URNs do not. 


  DS LSIDs are layered on top of URNs.  Certainly conventions layered on top
  DS of HTTP URIs can have names too, just as conventions layered on top of
  DS URNs can.  For example, the LSID conventions layered on top of HTTP
  DS could be named HLSID and published in a specification just as the
  DS existing LSID conventions are.

LSID conventions are layered on top of HTTP already. They just use a different
convention for naming, to indicate that they are different. 

Phil





Re: identifier to use

2007-08-24 Thread Phillip Lord

 XW == Xiaoshu Wang [EMAIL PROTECTED] writes:

  XW Phillip Lord wrote:
   To me it makes no sense to layer multi different protocols over a single
   identifier. Imagine I get an URI like http://uniprot.org/P4543, it could
   be
   
   1) a meaningless concept identifier in an ontology
   2) a URL which resolves to a pretty web page, via a single step process
   3) a URL which always resolve to the same data 4) A URL which resolves to
   the current version of some spec like the W3C recommendation pages. 5) A
   URL which is meant to be considered to be a location independent ID. 6)
   What ever else we have decided to layer onto the same identifier scheme.
   
   To me, it doesn't make any sense.
  XW Does it make sense to you if our personal name is put like Xiaoshu,
  XW male, dark hair, 5'8, email=..., address, etc., etc., Wang? Because if
  XW so, I think we would be required to name ourself with our DNA string,
  XW which is still not enough since it doesn't have my birth time, place,
  XW alive-status

  XW Don't mistaken name/identifier as information. Then ask yourself what
  XW you want from a name? Then, a lot of sense will start coming to you.

Unlike you, I have stated what I think the requirements are from an identifier
in life sciences. 

I want an identifier that I can do one thing with and, preferably, one thing
only. Not 5. I am not suggesting that we put semantics into the identifiers
other than those semantics that we need for using the ID. So, your analogy is
wrong. 

I have a better analogy. My name is Dr Phillip Lord. This is already
overloaded, as I'm a yeast geneticist (or was) not a medic. Why don't we give
another 4 or 5 meanings to Dr on the grounds that, as people have seen Dr
before, they will be happier with that than a new title.

Phil



Re: identifier to use

2007-08-24 Thread Phillip Lord

 MS == Matthias Samwald [EMAIL PROTECTED] writes:

  MS So you want to advertise what can be expected (or NOT expected) before
  MS the web client starts the retrieval process? 

If we are to use http: based URIs for things which are never meant to be
retrieved (like ontology concepts), then it has to be before the retrieval
process, no? 


  MS If this is desirable, why could we not, in theory, agree on different
  MS syntactic hints in normal HTTP URIs?

  MS For example: http://uniprot.org/P4543_concept
  MS http://uniprot.org/P4543_web_resource
  MS http://uniprot.org/P4543_immutable_data
  MS http://uniprot.org/P4543_location_independent_id_(whatever_that_means)

Yes, we could invent some naming conventions and layer them over the top of
http, at least where http is capable of supporting the requirements; for the
last one, it's isn't. 

  MS This way, we could also give Semantic Web clients a message like you
  MS probably don't really need to resolve this and you can probably not
  MS expect something when you try, but if you really want to, you
  MS can. Trying to resolve a URI does not have zero costs for a client
  MS application, so they would probably try to follow this recommendation to
  MS avoid unnecessary HTTP GET requests (which, in turn, helps to avoid
  MS unnecessary net/server load).

  MS I do not really think that something like this would find widespread
  MS adoption, but it is certainly still more realistic than inventing and
  MS agreeing on a wholly new protocol for each.

Well, again, little of LSID is wholly new. In fact, the LSID protocol uses
other, standard protocols.

But inventing a new protocol is precisely what happened when HTTP was
created. Why did we not just reuse ftp? 

I think we are going around in circles here, so I'll leave the conversation
there. 

Phil




Re: identifier to use

2007-08-23 Thread Phillip Lord

 EJ == Eric Jain [EMAIL PROTECTED] writes:

   These archives will all need to use opaque identifiers to track
   relationships, provenance, versions, and other metadata.

  EJ The only digital archive project I'm vaguely familiar with is the
  EJ Internet Archive project, and that seems to relies on URLs. If someone
  EJ has some insight into any of the other projects (especially how they can
  EJ or can't handle different identifier schemes), that would be *really*
  EJ interesting!



Eric, trying putting digital preservation into google. There many projects
out there working in this area. 

Phil



Re: identifier to use

2007-08-23 Thread Phillip Lord

 EJ == Eric Jain [EMAIL PROTECTED] writes:

  EJ Do you mean fail over at run time, so when an identifier can't be
  EJ resolved, the resolver retries with a backup service?

Hilmar described the mechanism in his last email. Again, perhaps I am wrong. 

  EJ In general, my feeling is that there are lots of special mechanisms that
  EJ may be useful for some application (but overkill for others), but I
  EJ don't see any strong arguments why these couldn't be implemented with
  EJ HTTP URIs (which have the benefit that they can also be made usable in
  EJ simple ways).

I don't understand the desire to implement everything using HTTP. Why call
lots of things, which are actually several protocols by a name which suggests
that they are all one. How to distinguish between an HTTP URI which allows you
to do location independent, two step resolution and one which doesn't. Well,
one solution would be, perhaps, to call it something different, say, perhaps,
LSID? 


   As far as I can see, LSIDs are basically location independent. The only
   whole I can see is if someone else buys uniprot.org, sets up an LSID
   resolution service and then returns crap. purls have the same issue I
   think.

  EJ Yes, I guess that's a problem with all solutions that make use of the
  EJ domain name system in some way. (But I still think the benefits of doing
  EJ so outweigh the problems that are introduced by not using it.)

 I don't think LSID can cope with this, although a small extension would allow
 you to; you just need to blacklist domains where you should automatically use
 the fail over mechanism. 

  EJ Note that any other name-based registration system could run into
  EJ trouble, too: Let's say UniProt lost a trademark suite and was forced to
  EJ change its name to something else, I assume that wouldn't be good for
  EJ location independent identifiers such as urn:bm:uniprot:P12345...



If you loose a trademark and have to stop using identifiers with a uniprot
in them, then any system which uses a alphabetical ID is stuffed. Numbers
would be okay, cause you can't trademark numbers. The law is an ass. 

Phil



Re: identifier to use

2007-08-22 Thread Phillip Lord

 EJ == Eric Jain [EMAIL PROTECTED] writes:

  EJ I guess that could happen... Do you have some examples of
  EJ domain-specific standards that became de-facto standards, supported by
  EJ generic tools etc?

The web leaps to mind. Remember that? 


   As for being limited to a domain or not, would the LSID mechanism be more
   appealing if it read urn:guid:foo.org:Foo:12345? There's nothing in the
   LSID spec that makes it LS-specific, or due to which it make no sense
   outside of the LS.

  EJ You're right, from a technical point of view, it's not
  EJ domain-specific. But if no one else is using it, doesn't that make it
  EJ de-facto domain-specific?

Actually, LSIDs are domain specific, or rather they were designed to support
the needs of the Life Sciences; this is not to say that different domains do
not have the same needs. 

Look at DOIs and LSIDs. They are different, they emphasise different
things. LSIDs are based around a set of objects which potentially might be
very large and which might exist in many versions. So LSIDs have two-step
multi-protocol resolution. They have version numbers integrated. They exist
in a world where services disappear. So LSIDs have a fail over mechanism. 

DOIs are based around the stable, large organisations giving out the DOIs,
hence they have a heavy weight assignment (possibly involving cash). They are
based around small resources, mostly of the size that people can read, without
necessity for many, many versions. 

Conclusion: LSID and DOI are NOT domain specific at all, they are requirement
specific. They exist because different domains have different
requirements. 

No generic ID is going to fulfil the requirements is going to fulfil the
requirements is my thought. 


   Do you mean you would prefer if each journal set up URIs based on its
   self-chosen domain-name and we reference articles through that instead of
   DOIs? Or did you want to say something else?

  EJ If instead of doi:10.1038/nrg2158 an official URI looked something like
  EJ http://dx.doi.org/10.1038/nrg2158, would this make the system less
  EJ popular?

  EJ In fact, I suspect that the lack of such a transformation mechanism
  EJ turned away many people from the LSID system (that, and the ugly syntax
  EJ :-)

DOIs worked because there are actually relatively few publishers and they
moved on mass. In biology there are many more service providers, and most will
not adopt something till it looks stable and until people really bitch about
wanting it. 

Phil



Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-21 Thread Phillip Lord

 Alan == Alan Ruttenberg [EMAIL PROTECTED] writes:

  Alan Well, if I am restricted to using such Uniprot classes I will have
  Alan trouble representing important scientific findings. If Uniprot only
  Alan has one name for the two molecules, one of which has a snp that leads
  Alan to a loss of function that is the initiating factor of a disease, then
  Alan we have a problem, no? How do we say things about the disease related
  Alan form?


Make statements against an isoform of P38228.  


   
   If you create identifiers to describe proteins rather than protein
   records (like uniprot) then you have created a whole new set of IDs. When
   anyone wants to talk about a protein, they will have to look up the ID.

  Alan As they will when they want to talk about a record. Of course perhaps
  Alan we all will add some links of the sort that say the record is about
  Alan some set of classes of proteins, and that aspects of the protein in a
  Alan class can be described by pieces of the record.

  Alan But at least we'll know what we are talking about.

The question here is whether you will add to the confusion or decrease it. You
need to put an entire infrastructure in place for providing sane, consistent,
clearly defined names to proteins. I just use swissprot. 


  Alan But I'm open to discussing suggestions for representing these
  Alan statements by only making use of the Uniprot records ids, if you have
  Alan any.

Well, swissprot refers to isoforms I think. Push comes to shove, just use the
sequence. 

Phil







-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED]
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909,   skype: russet_apples
Newcastle University,   
NE1 7RU



Re: IDs + 5; everybody - 10

2007-07-19 Thread Phillip Lord


The LSID protocol defines a protocol for resolving LSIDs some of which uses
web services. If it were a bring problem, the WS definition of this API could
be replaced with something else, such as REST. 

This is true with any API specification. In this case, the LSID API is not
that complex, so the translation wouldn't be too hard. 

 Alan == Alan Ruttenberg [EMAIL PROTECTED] writes:

  Alan I'm intrigued by this remark. Phil, would it be possible to sketch out
  Alan how one could graft REST style services into LSID space?

  Alan -Alan


  Alan On Jul 16, 2007, at 8:22 AM, Phillip Lord wrote:
   The LSID use of web services should not really be seen as a problem. Push
   comes to shove, even this part could be replaced or made option if a REST
   style solution where desired.





-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED]
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909,   skype: russet_apples
Newcastle University,   
NE1 7RU



Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-19 Thread Phillip Lord

 MS == Matthias Samwald [EMAIL PROTECTED] writes:

    It would be more satisfying for us to know intentionally what we  mean
   by protein. It would be good to have a clear set of  definitions. But,
   ultimately, I think it would be mistaken. If we  have the ability to
   express the class of protein molecules defined  by the swissprot record
   OPSD_HUMAN, then I think we have all we  need.

  MS OWL is very open towards incomplete information. If all we know about
  MS the protein is the sequence of amino acids, than this is what we add to
  MS the protein class through a 'some-values-from, necessary' property
  MS restriction (and not 'necessary and sufficient', since we are still
  MS unsure if this information alone is enough to DEFINE the protein
  MS class). If we know that proteins of this class can have some
  MS polymorphisms, we can enumerate the different possible sequences as best
  MS as we can. If we are unable to enumerate all of them at the moment, or
  MS are unsure about something, we just leave it out and maybe add it later.


This is my worry. Effectively, I think you are saying why not take all the
knowledge in swissprot and duplicate it in our class definitions. I don't see
what this adds. All I see is that it will add confusion and the potential for
data to get out of date. 

This is an important issue and will raise it's head repeatedly. Should we
define Homo sapiens? Should we determine all the necessary and sufficient
conditions? Or should we just point to a pre-existing taxonomy and a
pre-existing process?

I think that there are many clear reasons for keeping statements about the
informatics entities -- the database entries for example. To do otherwise,
runs the risk of enormous mission creep (always a problem with data modelling
and ontologies).

Phil



Re: Rules

2007-07-19 Thread Phillip Lord

 CM == Chris Mungall [EMAIL PROTECTED] writes:

  CM Definitely - although I don't think OWL/SWRL is quite the right tool for
  CM this job yet

  CM although perhaps getting closer:

  CM Putting OWL in Order: Patterns for Sequences in OWL Nick Drummond1, Alan
  CM Rector1, Robert Stevens1, Georgina M2Matthew Horridge1, Hai H. Wang1,
  CM Julian Seidenberg1
  CM http://owl-workshop.man.ac.uk/acceptedLong/submission_12.pdf

  CM (I'm not a big fan of the modeling biological sequences as lists
  CM approach, but I think the results could be replicated using a realist
  CM representation)


There is a much more relevant paper by some of the same authors.

  author =   {Wolstencroft, K. and Lord, P. and Tabernero, L. and
  Brass, A. and Stevens, R.},
  title ={{Protein classification using ontology
  classification}},
  journal =  {Bioinformatics},
  volume =   22,
  number =   14,
  pages ={e530-538},
  doi =  {10.1093/bioinformatics/btl208},
  year = 2006,

We can define proteins and classes of proteins in OWL and in some
circumstances this can be useful. 

I think, though, that it's a danger for us to say that because we can define
things, we should. At times, it's better to allow things to be self-standing
kinds, and leave the details of the definition to authors. 

Uniprots rules for distinguishing between one protein and another are complex,
and I can't see the point of codifying them in OWL (even if you could). 

Of course, there may be times when you need to make separations that uniprot
doesn't -- two isoforms may have the same uniprot ID -- but you could still
make this distinction with reference to uniprot. 

Phil



Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-19 Thread Phillip Lord

 Alan == Alan Ruttenberg [EMAIL PROTECTED] writes:

  Alan Summary: Answering Phil's questions, and clarifying one thing he
  Alan asserts about what I said.

   What if they have a polymorphism?
  Alan No.
   Are two isoforms from an alternate splice the same protein?
  Alan No.


In both of these you differ from uniprot. 

   Unsatisfying, maybe. Clear definitions are important. But
   interoperability, and the lack of duplication are more so.

  Alan Forgive my confusion, but how exactly will we achieve interoperability
  Alan and lack of duplication if we don't have definitions? How would we
  Alan know that we don't have duplication, for example?


If you create identifiers to describe proteins rather than protein records
(like uniprot) then you have created a whole new set of IDs. When anyone wants
to talk about a protein, they will have to look up the ID.

   snip

   And, yet, you just told me that you could buy a antibody with just a
   swissprot ID. So, let me restate the question, what are you going to do
   with a protein ID that you are not going to do with a swissprot ID, or
   the protein formally known as OPSD_HUMAN.

  Alan I did not say that. I've said some people have identified antibodies
  Alan by such ids. Unfortunately this information is of limited use when
  Alan actually ordering an antibody, where I am interested in much more
  Alan information, such as how specific it is, how it has been validated,
  Alan and other properties related to how it behaves in certain experimental
  Alan settings. I *want* to be able to have identifiers(URIs) that are up to
  Alan the job of ordering reagents.

Well, I am not sure that you are going to achieve this with an identifier. You
need significant extra amounts of metadata. 

My point here is simple. Separating out the informatics and biology conform
better to our notion of reality, sure. But you are talking about modelling
what makes a protein and, more, a type of protein. Work through your scenarios
and see whether you need a protein ID for this. If not, you are introducing a
layer of abstraction that you don't need. 

Phil





-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED]
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909,   skype: russet_apples
Newcastle University,   
NE1 7RU



Re: IDs + 5; everybody - 10

2007-07-19 Thread Phillip Lord


 JR == Jonathan Rees [EMAIL PROTECTED] writes:

  JR Well, to do a fair comparison of LSID URIs and HTTP URIs, you would have
  JR to take all the features you need, see how to best implement them in
  JR both contexts, and then make an overall assessment. 

  JR What is your worry, by the way? Would bringing the benefits of LSIDs to
  JR other parts of the URI space be a bad thing?

Well, 3 or 4 years ago I sat in a meeting running over exactly this
ground. That time the comparison was to handles. And, now, several years
later, we are comparing to pURLs.


  JR There is the criticism of HTTP URIs that they cannot be used as
  JR identifiers, and I admit that the pun with URLs can be misleading.

It is misleading period. 


  JR To repeat, I'm just trying to be objective. I am not in a position to
  JR make decisions; I am just trying to elucidate the comparison between the
  JR two naming schemes so that HCLS can make a rational decision. I was the
  JR one at the Amsterdam HCLS meeting, at which there were no LSID
  JR defenders, saying that we ought to listen to what LSID users have to
  JR say, and in many private conversations I have been coming to the defense
  JR of benefits that LSIDs have that HTTP URIs so far lack. And I've finally
  JR gotten around to reading the darned spec. So I hope you LSIDers don't
  JR think you're being dissed.

My suspicion is that you won't find any LSID defenders for the reason that the
people who designed the spec do not want to listen to essentially the same
arguments again. 

My problem with this whole process is that you are missing the most important
criterion for comparing identifier schemes. They all basically work, as far as
I can see, they all basically do the same job. There are technical differences
between them but, frankly, they are not that great. 

So what I want to know is, what is the difference between DOIs and blog
permalinks? Both of these have been taken up, in a way that LSIDs or anything
in life sciences have not. Perhaps it is because the library and publishing
community have had something like this (the ISBN and associated identifiers)
for a long time already. 

My question, then, is not what do the identifiers, but what will people use. 

Phil



IDs + 5; everybody - 10

2007-07-16 Thread Phillip Lord

 Mark == Mark Wilkinson [EMAIL PROTECTED] writes:

  Mark WSDL is a widely accepted W3C spec that is becoming increasingly 
accepted
  Mark worldwide (and is, generally, automatically generated based on your
  Mark interface, so requires little or no manual construction), and which 
solves a
  Mark problem that we *know without any doubt* URLs cannot solve.  I really 
don't
  Mark see an advantage in trying to ignore them, circumvent them, or otherwise
  Mark relegate them to a secondary lookup, in the base spec for the Semantic 
Web,
  Mark when we know that we are going to have to deal with them at some point


I think that I agree. The LSID use of web services should not really be seen as 
a
problem. Push comes to shove, even this part could be replaced or made option 
if a
REST style solution where desired. 

From my perspective, the thing that worries me about this whole discussion is 
that we
seem to be retreading old paths. The LSID standard first raised it's head many 
5 or
so years ago. And we still appear to be talking about basic technology here;
ultimately, there are still some really big biological issues to be dealt with 
wrt to
identifiers. 

The biggest barrier, however, is that of community uptake. Ultimately the 
differences
between LSIDs (a two step resolution to provide persistence, location 
independence
and some other stuff) and PURLs (a two step resolution, etc...) are not that
important. Technology churn will prevent community uptake faster than almost 
anything
however. 

I have a solution: I am going to use the wonders of the semantic web to 
describe all
of these different identifiers that we have invented so far and all those we 
will
invent in the future. Now, what identifiers should I use in the ontology? 

Phil



Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Phillip Lord

 MK == Marijke Keet [EMAIL PROTECTED] writes:

  MK Lack of sufficient knowledge about a particular (biological) entity is
  MK a sideshow, not an argument, to the issue of distinguishing real proteins 
from
  MK their records.


I agree. The argument is that it's very hard to describe what you mean by a
protein. We almost certainly don't mean a protein molecule. We might mean a 
type of
protein. But then we don't know whether two protein molecule are actually of a 
given
type. 

My questions are how often do we want to refer to a protein, rather than a 
record
about a protein? And who is responsible for ascribing a ID to a specific type of
protein. In practice, in bioinformatics, the answer to this is a) we don't and 
b)
uniprot. 

So, while distinguishing between a uniprot record and a protein seems like a 
good
idea, I'm not convinced it brings you anything. What are you going to do with 
your
protein ID? 

Phil 


-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED]
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909,   skype: russet_apples
Newcastle University,   
NE1 7RU



Re: IDs + 5; everybody - 10

2007-07-16 Thread Phillip Lord

 JR == Jonathan Rees [EMAIL PROTECTED] writes:

  JR It may look like unnecessary replication, but it's not really, since
  JR we're already committed to the http: space and all the issues that LSID
  JR addressed are issues there as well.

  JR The same remarks apply to handles, DOIs in particular.


Are you suggesting that DOIs shouldn't be used either? 

Phil



Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Phillip Lord

 Alan == Alan Ruttenberg [EMAIL PROTECTED] writes:

   I agree. The argument is that it's very hard to describe what you mean by
   a protein. We almost certainly don't mean a protein molecule. We might
   mean a type of protein. But then we don't know whether two protein
   molecule are actually of a given type.

  Alan I'm confused. I think we all would agree that there are instances of
  Alan proteins and we have a good idea of what they are. We also know that
  Alan there are groups of proteins that are built off the same template and
  Alan share certain properties. 

Take these rhethorical questions: 

Is Red Opsin in human the same as Red Opsin in Cattle? 

Is Red Opsin in me, necessarily the same as Red Opsin in you? 
What if they have a polymorphism? 

Are two isoforms from an alternate splice the same protein? 

If a protein has been partly digested, is it still the same? 

Are haemoglobin alpha and beta the same? 


   My questions are how often do we want to refer to a protein, rather than
   a record about a protein?

  Alan Any time we want to make a scientific statement about proteins.  In my
  Alan work, that means virtually all the time. For example, I have a body of
  Alan work that is the target of text mining at the moment - If the text
  Alan mining worked well enough to understand the articles, what should it
  Alan generate for semantic web consumption?

The point is that you can't deal with a protein computationally. You can't
resolve it, analyze it computationally. It's always second hand information
that you want to deal with. 


   And who is responsible for ascribing a ID to a specific type of
   protein. In practice, in bioinformatics, the answer to this is a) we
   don't and b) uniprot.

  Alan I agree with a) - we mostly don't and when we do we do it in an
  Alan unclear and nonstandard way. I disagree with b) Exactly what the class
  Alan of proteins described by a uniprot record is not clear (though Eric
  Alan started to make a theory of what it could be). I have seen uniprot ids
  Alan used even to identify antibodies to a protein.

Yes, exactly. A uniprot record defines a class of proteins extensionally. This
means, antibodies to the proteins described by OPSD_HUMAN (for example).


  Alan As for who is responsible, I would say that our community is
  Alan responsible. I expect that there will be efforts along this line in
  Alan the OBO Foundry and I would hope that there would be broad
  Alan participation from the people who are interested in following this
  Alan list.

And I would say not. Uniprot are the people who understand proteins, they are
the people who already have defined procedures for determining whether one
protein is the same as another, who have answered the questions above and who
will go back through the resource and update it as biological knowledge
changes. And it's a big job. There are 100 annotators working at this.  More
over, Uniprot are the people who are trusted to make the right decisions, not
us.

It would be more satisfying for us to know intentionally what we mean by
protein. It would be good to have a clear set of definitions. But,
ultimately, I think it would be mistaken. If we have the ability to express
the class of protein molecules defined by the swissprot record OPSD_HUMAN,
then I think we have all we need. 

If we make our own definitions, all that we have done is duplicate what the
uniprot team are already doing. And we will, almost inevitably, do it somewhat
differently. All we would do is create confusion. The only way that we ensure
that we do the same thing as uniprot is say yeah, what they said. 

Unsatisfying, maybe. Clear definitions are important. But interoperability,
and the lack of duplication are more so. 

   So, while distinguishing between a uniprot record and a protein seems
   like a good idea, I'm not convinced it brings you anything. What are you
   going to do with your protein ID?

  Alan I would like to be able to have Invitrogen be able to say that product
  Alan xxxyyy is an antibody to some specific class of phosphoproteins in a
  Alan way that a semantic web agent could do some shopping for me if I
  Alan needed such a reagent.

And, yet, you just told me that you could buy a antibody with just a swissprot
ID. So, let me restate the question, what are you going to do with a protein
ID that you are not going to do with a swissprot ID, or the protein formally
known as OPSD_HUMAN. 

Phil



Re: IDs + 5; everybody - 10

2007-07-16 Thread Phillip Lord



My apologies. I wasn't sure, which is why I asked. I just found your idea of
reproducing LSIDs advantages (and implicitly DOI) in http a little worrying. 
I may have misread your email. 

Phi



 JR == Jonathan Rees [EMAIL PROTECTED] writes:

  JR I never said LSID or DOIs shouldn't be used, and I don't see how my
  JR message can be construed as saying this. I'm trying to be fair to all
  JR solutions by talking about real technical requirements. If the W3C HCLS
  JR SIG wants to recommend the use - even minting - of LSIDs, that's fine
  JR with me. But I don't think any decisions have been reached.

  JR LSID users are committed to using HTTP URIs. For example, anyone who
  JR uses both LSID and RDF is committed to using the HTTP URI
  JR http://www.w3.org/1999/02/22-rdf-syntax-ns#type.

  JR Jonathan

  JR On 7/16/07, Phillip Lord [EMAIL PROTECTED] wrote:
JR == Jonathan Rees [EMAIL PROTECTED] writes:
   
  JR It may look like unnecessary replication, but it's not really, since
  JR we're already committed to the http: space and all the issues that LSID
  JR addressed are issues there as well.
   
  JR The same remarks apply to handles, DOIs in particular.
   
   
   Are you suggesting that DOIs shouldn't be used either?
   
   Phil
   




-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED]
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909,   skype: russet_apples
Newcastle University,   
NE1 7RU



Re: Evidence

2007-06-20 Thread Phillip Lord

 MM == Mark Montgomery [EMAIL PROTECTED] writes:

  MM Also, for those who use generic email addresses without links to
  MM web sites, it would be very useful to occasionally inform folks
  MM on our backgrounds and relationships, like a link to a web page
  MM and/or bio for example.


You might want to try...

http://www.google.co.uk/search?q=Alan+Ruttenberg

which will reveal much about the mysterious entity known as Alan
Ruttenberg. 

Phil



Re: Advancing translational research with the Semantic Web

2007-05-22 Thread Phillip Lord

 PH == Pat Hayes [EMAIL PROTECTED] writes:

  CM In all the examples given, the lifted[*] n-ary relation was
  CM never truly a relation in the first place and always better
  CM modeled as a class. It's kind of cheating.
   
   Well, it is kind of cheating, yes, although if it works...

  PH No, really, its not cheating. This reduction of n-ary relations
  PH to binary+unary relations is quite general and quite sound, and
  PH has been known and thoroughly understood for over a century. It
  PH can always be done, and it often makes perfectly good intuitive
  PH sense. 

No, Chris is right. It's cheating. I have to decide before starting
which of my relations are n-ary and which are not. Moving between the
two is not necessarily a trivial thing to do. And having some
relations being relations and some being classes is less than clear. 

   Well, this I would agree with. Folding design patterns in, would
   be nice.

  PH Agreed. We made this a central feature of our COE graphic OWL
  PH editor, in that a user can design a 'template' (a chunk of OWL
  PH with gaps in it) and give it a name, then just drag-and-drop one
  PH into a new OWL concept map and fill in the missing
  PH parameters. Its a simple device and not perfect, but it does
  PH seem to be useful.


Yes. Protege does a similar thing. I'd like to see this at a language
level. 

Phil



Re: Advancing translational research with the Semantic Web

2007-05-21 Thread Phillip Lord

 CM == Chris Mungall [EMAIL PROTECTED] writes:

 
   Out of curiosity, can you describe how different or similar this
   is to the result that you can achieve in the N-ary relation
   design pattern for OWL?
   
   Obviously, building things into the DL is nice, but it's not
   currently representable in OWL, so would require tooling support,
   while the OWL N-ary relation pattern doesn't.

  CM I'm afraid I'm unclear how to state the OWL n-ary relation
  CM pattern (http://www.w3.org/TR/swbp-n-aryRelations) where I
  CM really need it. In all the examples given, the lifted[*] n-ary
  CM relation was never truly a relation in the first place and
  CM always better modeled as a class. It's kind of cheating. 

Well, it is kind of cheating, yes, although if it works...

  CM What if my n-ary relation is transitive or if the 3rd argument
  CM is a temporal interval over which the relation holds?

The former is hard because it's not clear what do you with n-ary
relationships. I think that this is true for any
representation. Fundamentally, if you say a is part of b and I say
b is part of c, then is a part of c and according to whom?

It is possible to use build on top of the n-ary relationship, for
example a symmetric property. Perhaps you could do the same for
transitivity if you could work out exactly what the semantic should
be. 


  CM I think the former is doable with property role chains. Updating
  CM the n-ary relations note with this - and all the other omitted
  CM details, such as how to re-represent domain/range, functional
  CM properties, n- ary relations in restrictions etc - would take a
  CM lot of work and would make it utterly terrifying to the naive
  CM user.

Yep, but I think that this reflects the underlying complexities of
life. 

  CM Nevertheless the results are clunky and will need special tool
  CM support [**] to avoid going insane. In general I am wary of
  CM design pattern type things - they are usually a sign that the
  CM language lacks the constructs required to express things
  CM unambiguously and concisely. It sounds like DLR could provide
  CM this, which would be great.

Well, this I would agree with. Folding design patterns in, would be
nice. 

Phil



Re: Advancing translational research with the Semantic Web

2007-05-18 Thread Phillip Lord

 MK == Marijke Keet [EMAIL PROTECTED] writes:

  MK Regarding “reification design patterns” and the reification 
  MK OWL (not the thorny logic-based representation of beliefs et
  MK al), permit me to mention that support for n-ary relations
  MK ---where n may also be 2--- in description logics is already
  MK possible with DLR [1] and implemented with reasoner-support in
  MK the iCOM tool (the tool may not live up to end-user-level
  MK expectations on userfriendliness, but it works) [2]. 


Out of curiosity, can you describe how different or similar this is to
the result that you can achieve in the N-ary relation design pattern
for OWL? 

Obviously, building things into the DL is nice, but it's not currently
representable in OWL, so would require tooling support, while the OWL
N-ary relation pattern doesn't. 

Phil



Re: Advancing translational research with the Semantic Web

2007-05-16 Thread Phillip Lord

 BP == Bijan Parsia [EMAIL PROTECTED] writes:

  EJ Reification?
   
   That's who, not why.

  BP No, you can do both with reification.

Well, you can do anything with anything:-)


   The Gene Ontologies evidence codes are and references are much
   closer.
   
   Also, I am not sure of the semantics of reification.

  BP RDF reification has very little to no built in semantics. What
  BP it provides is a standardized syntax.

Ok. I presume it provided a standardised syntax for something, at
least implied. 

Does it mean, then, when a triple is reified that the triple is in
some way associated with this other resource?


  BP However, all this *supports* your point. There *IS* no
  BP standardized way to represent this sort of information.  There
  BP is a more or less standard (and widely loathed) hook/technique
  BP upon which you could build a standard mechanism for representing
  BP this sort of information.


Yeah, thats my feeling. Reification is a start for doing this, and
might provide a underpinning. 

Phil



Re: ISMB Bio-Ontologies Meeting

2007-05-01 Thread Phillip Lord



It's something that we'd like to see:-)

Phil


 Alan == Alan Ruttenberg [EMAIL PROTECTED] writes:

  Alan I forget, was someone submitting an abstract about our work to
  Alan this workshop?  -Alan


  Alan On Apr 26, 2007, at 1:18 PM, Susanna wrote:

   ** Apologies for cross posting **CALL FOR PAPERS and POSTER
   ABSTRACTS (Deadline May 1st) Proceedings in BMC Bioinformatics
   
   *^**^***^^^^^^^^^^*^**^***^**
   Bio-Ontologies SIG Workshop Vienna, Austria: July 20 2007
   
   “Bio-Ontologies: ten years past and looking to the future”
   



Re: [biont] Nice wikipedia page on ontology

2007-01-25 Thread Phillip Lord

 Alan == Alan Ruttenberg [EMAIL PROTECTED] writes:

  Phil Yeah, Robert has my main beef which is the distinction
   between the representation language and the representation
   itself.

  Alan Yup. Though there is too often a confusion between the
  Alan ontology and the representation. In some ways I think that it
  Alan is unfortunate that OWL has Ontology in it's name. 

OWL is just following on from common practise. The use of ontology
by computer science has added to it's original philosophical
meaning, which has also resulted in confusion. I tend to use
ontologies as computational artifacts, which is by bias. 

I don't think that there is much that can be done about this now. The
(many) uses of the term have got a bit fixed. 


  Phil The use of algorithms is clearly wrong and I don't think
   that an upper ontology provides consistency checks, nor that an
   ontology needs one to be formal.

  Alan Algorithm's not a good word.

Algorithm has a reasonably tightly defined meaning -- actually
wikipedia covers it well. I think that the article just uses it
wrong. 

Still, it's been replaced with axioms now. The article struggles
towards generating a common understanding. 

Phil



Re: [biont] Nice wikipedia page on ontology

2007-01-24 Thread Phillip Lord


Hmmm. Sure I wrote more than that in my original email. 

Yeah, Robert has my main beef which is the distinction between the
representation language and the representation itself. The use of
algorithms is clearly wrong and I don't think that an upper ontology
provides consistency checks, nor that an ontology needs one to be
formal. 

Still, it's an early wikipedia entry. These things often improve over
time. 

Phil

 Robert == Robert Stevens [EMAIL PROTECTED] writes:

  Robert 'd be inclined to agree with Phil. I don't where the bit
  Robert about algorithms has come from. The other mistake, I
  Robert think, is not to make the distinction between formality of
  Robert language for representaiton and the formality of the
  Robert ontology itself. The latter is, I think, a matter of the
  Robert distinctions made. One can make an ontology in a formal
  Robert language like owl, but still be informal in the ontological
  Robert distinctions made.

  Robert Formal ontological distinctions can be encapsulated in an
  Robert upper level, but upper level otnoogies are not necessarily
  Robert formal

  Robert Anyway, it is bad at almost any level

  Robert Robert.
  Robert ,At 13:55 24/01/2007, Phillip Lord wrote:

Alan == Alan Ruttenberg [EMAIL PROTECTED]
writes:
   
  Alan Start at http://en.wikipedia.org/wiki/Formal_Ontology
   
  Alan -Alan
   
   
   Well, it starts of with this
   
   A Formal ontology is an ontology modeled by algorithms. Formal
   ontologies are founded upon a specific Formal Upper Level
   Ontology, which provides consistency checks for the entire
   ontology and, if applied properly, allows the modeler to avoid
   possibly erroneous ontological assumptions encountered in
   modeling large-scale ontologies. 
   
   
   
   Almost none of which I would agree with.

-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED]
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909,   skype: russet_apples
Newcastle University,   
NE1 7RU



Re: OWL vs RDF

2006-10-29 Thread Phillip Lord

 WB == William Bug [EMAIL PROTECTED] writes:

  WB This is a very important point.  Thanks, Phil.

  WB As is spelled out in the wonderful ProtegeOWL Tutorial PDF
  WB (which would be wonderful to have updated a bit), leaning on the
  WB reasoner during early phases of ontology construction is very
  WB helpful, but ultimately once you have more hardened
  WB components, you can save the inferred graph and distribute
  WB that for the user community.


Yes, and this is a good route in some cases. It's worth remembering,
however, that if you do this then it limits the usages you can make of
the ontology -- you can't, for instance, express queries against the
ontology using classes which are not mentioned in the ontology
already. The ability to combine the conceptual lego of an ontology at
any points is very useful at many times. 

But, it can make deployment architectures simpler. You win some, you
loose some. 

Phil



Re: OWL vs RDF

2006-10-26 Thread Phillip Lord

 Alan == Alan Ruttenberg [EMAIL PROTECTED] writes:

  Alan Well it would be educational to get your view on what you can
  Alan you do with owl without a reasoner that's not easier to do
  Alan without owl?


You can do lots of things with OWL without a reasoner. The Gene
Ontology is representable in OWL, for example, and uses a simple
enough expressivity that you could do without a reasoner easily
enough. Of course, you need to use some kind of reasoning engine,
but something which understands transitive closure is enough. 

Whether it's easier to do without OWL depends on what the
alternatives are. You could also represent GO style semantics in RDF
(although, I think, the existential nature of part_of would not be
explicit), or indeed anything else capable of representing a
graph. 

  Alan And how are you to know when you do need the reasoner and when
  Alan you don't?

When you use enough of the expressivity of OWL, where enough is
relatively undefined. 

Phil



Re: Modeling large scale ontologies in OWL: Unmet needs

2006-09-21 Thread Phillip Lord

 DD == David Decraene [EMAIL PROTECTED] writes:

  DD In large scale ontologies, one link should suffice,
  DD HasPart, and whether the part is a finger, toe, nail, muscle or
  DD anything else is not a task for the property to describe, but
  DD for the target


I'm not sure why this should be true for large ontologies. It seems to
me that this is just a question of modelling style. Either way should
actually work depending on what you are trying to achieve. 

Having multiple properties allows you to provide different properties
to the properties which can be useful. If, in your example, you have a
super property hasPart, then it seems to me that it would be
relatively straight forward to reduce the information content of the
ontology so that the subproperties are no longer represented. So hand
some hasDigit Finger can be represented as hand some hasPart Finger. 


  DD In formal ontology you could express this relation
  DD on a general level of parthood: Hand HasPart 6thfinger,
  DD cardinality 0. This is not possible in OWL.

Many people have used subproperties to do something like this. It's a
poor hack for representing qualified cardinality (and doesn't capture
exactly the same semantics). But it is only a hack. As others have
said, the lack of qualified cardinality in OWL is generally regarded
as unfortunate, and it should be coming back in. 

Phil






-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED]
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Newcastle University,   Claremont Tower, Room 909
NE1 7RU



Re: Performance issues with OWL Reasoners

2006-09-19 Thread Phillip Lord

 KV == Kashyap, Vipul [EMAIL PROTECTED] writes:

   There are two different things in the technologies you mentioned;
   relational to X mapping tools, and metaschema approaches. They
   are quite different. For the instance store, the relational
   database is really an implementation detail. It's basically a
   reasoner with somewhat limtied expressivity which is persistent
   and (hopefully) scalable.

  KV [VK] Is this true only for TBox reasoning, or ABox and TBox
  KV reasoning? I had the impression (possibly mistaken) that ABox
  KV reasoning takes advantage of the relational backend. At least
  KV this is implied by the following snipper From an earlier
  KV e-mail:

The instancestore (or at least the very early version that I
implemented) only does T-Box reasoning technically. The A-Box is
stored in a relational database. 

It works like this -- when an instance is asserted (described) the
reasoner is used to localise it's description in the ontology. This
data is then denormalised and put into the database. So, for example,
if you have an entirely asserted hierarchy, you should be able to get
all instances of a given class without (very much) reasoning. In
other, more complex, cases individuals have to be reasoned over quite
a lot at both insert and query phases. 

The reason that it works is because you can't assert relationships
between individuals, so you never need to re-reason things about
instances. For example, imagine these assertions

phil hasSibling martin


This makes me a member of the class of things which have
siblings. Next we assert

martin hasSex Male

Now, my own definition may have changed -- I am a member of the class
of things which have brothers. So the new insert potentially requires
updating our understanding of all instances. But with the instance
store you can't make the first assertion, only things of the form of
the second, so you are safe. 

As I said, this was true of the first instancestore. The current
version is cleverer and can make some assertions of the first
form. You'd have to ask others for details of this. 


  KV [VK] I guess I need to get out of my laziness and read the
  KV paper, but how different is the metaschema approach from the X
  KV to relational mapping approach?

It's a metaschema -- every ontology uses the same schema. For a
relational mapping, you'd expect different ontologies to use different
ones. 

  KV Even if it is a very different approach, is it able to leverage
  KV the scalability features of an RDBMS enumerated above? I would
  KV be interested in your responses.


The instancestore uses indexes, yes. This is actually easier -- you
have only one schema, so the appropriate indexes are the same for
every ontology. 

Phil



Re: Performance issues with OWL Reasoners = subclass vs instance-of

2006-09-18 Thread Phillip Lord

 CO == Chimezie Ogbuji [EMAIL PROTECTED] writes:

   ABox is more complex than TBox, although I believe the difference
   is not that profound (ie they are both really complex). For a DL
   as expressive as that which OWL is based on, the complexities are
   always really bad. In other words, no reasoner can ever guarantee
   to scale well in all circumstances.

  CO Once again: pure production/rule-oriented systems *are* built to
  CO scale well in *all* circumstances (this is the primary advantage
  CO they have over DL reasoners - i.e., reasoners tuned specifically
  CO to DL semantics).  This distinction is critical: not every
  CO reasoner is the same and this is the reason why there is
  CO interest in considerations of using translations to datalog and
  CO other logic programming systems (per Ian Horrocks suggestion
  CO below):

Well, as I am speaking at the limit of my knowledge I cannot be sure
about this, but I strongly suspect that what you say is wrong. 

Any computational system can only be guaranteed to work well in all
circumstances if it is of very low expressivity. If a system
implements expressivity equivalent to Turing/Lambda calculus, then no
such guarantees are ever possible, nor can you determine
algorithmically which code will perform well and which not.

Part of the problem with DL reasoners and their scalability is,
indeed, their relative immaturity. But, part of the problem is because
that is just the way that universe is built. Ain't much that can be
done about this. 



   Another interesting approach that has only recently been
   presented by Motik et al is to translate a DL terminology into a
   set of disjunctive datalog rules, and to use an efficient datalog
   engine to deal with large numbers of ground facts. This idea has
   been implemented in the Kaon2 system, early results with which
   have been quite encouraging (see
   http://kaon2.semanticweb.org/). It can deal with expressive
   languages (such as OWL), but it seems to work best in
   data-centric applications, i.e., where the terminology is not too
   large and complex.

  CO I'd go a step further and suggest that even large terminologies
  CO aren't a problem for such systems as their primary bottleneck is
  CO memory (very cheap) and the complexity of the rule set. The set
  CO of horn-like rules that express DL semantics are *very* small.


Memory is not cheap if the requirements scale non-polynomially. 
Besides, what is the point of suggesting that large terminologies 
are not a problem? Why not try it, and report the results?

Phil



Re: Performance issues with OWL Reasoners

2006-09-15 Thread Phillip Lord

 KV == Kashyap, Vipul [EMAIL PROTECTED] writes:

   I may be wrong here, but as far as I know the expressivity of
   OWL-DL, for example, is too different from that of RDBMS for this
   to work completely.

  KV However, this was done primarily for CLASSIC and other DLs which
  KV were possibly less expressive than OWL-DL and FACT. 

Yeah, it's straight forward enough if you just have, for example,
subsumption and existentials. 


  KV I was wondering if the current implementations of DL reasoners
  KV such as Pellet, Racer, etc. adopt this strategy.

Not as far as I know. 


   Having said that there is a similar approach, which uses
   RDBMS. For example, the instance store
   (http://instancestore.man.ac.uk)

  KV [VK] Maybe the increased expressivity of OWL-DL leads to the
  KV above design choice of SQL + reasoning.

Yes. Why try to get a RDBMS to do DL reasoning, when a tableaux
reasoner can do it for you? 

Phil



Re: Performance issues with OWL Reasoners

2006-09-15 Thread Phillip Lord

 KV == Kashyap, Vipul [EMAIL PROTECTED] writes:

   Yes. Why try to get a RDBMS to do DL reasoning, when a tableaux
   reasoner can do it for you?

  KV [VK] Scalability and Performance :)

Not at doing DL reasoning. Relational databases do relational stuff
well. For everything else, they are as likely to be rubbish as fast.

Phil




Re: Performance issues with OWL Reasoners = subclass vs instance-of

2006-09-15 Thread Phillip Lord




 WB == William Bug [EMAIL PROTECTED] writes:

  WB CLASSes represent UNIVERSALs or TYPEs.  The TBox is the set of
  WB CLASSes and the ASSERTIONs associated with CLASSes.

  WB INSTANCEs represent EXISTENTIALs or INDIVIDUALs instantiating a
  WB CLASS in the real world.  The ABox is the set of INSTANCEs and
  WB the ASSERTIONs associated with those INSTANCEs.



I'd take a slight step back from this. You can think of classes and
instances in this way. But in the OWL sense, a class is a logical
construct with a set of computational properties. Instances is a
more difficult term. OWL actually has individuals. The instance store
uses instances because they are not really OWL individuals. 
There is also a philosophical concept of what a class is, what a
universal is an so on, which may be somewhat different, and is also
open to debate. 

  WB Properly specified CLASSes are defined in the context of the
  WB INSTANCEs whose PROPERTIES and RELATIONs they formally
  WB represent.

  WB Properly specified INSTANCEs are defined via their reference to
  WB an appropriate set of CLASSes.

Think this would be circular. An OWL class is defined by the
individuals that it might have in any model which fits the
ontology. Not just the individuals it has an a specific model. 


  WB Reasoners (RacerPro, Pellet, FACT++) generally have
  WB optimizations specific to either reasoning on the TBox or
  WB reasoning on the ABox, but it's difficult (i.e., no existing
  WB examples experts such as Phil and others can cite) to optimize
  WB both for reasoning on the TBox, the ABox AND - most importantly
  WB - TBox + ABox (across these sets).

ABox is more complex than TBox, although I believe the difference is
not that profound (ie they are both really complex). For a DL as
expressive as that which OWL is based on, the complexities are always
really bad. In other words, no reasoner can ever guarantee to scale
well in all circumstances. This does not mean that you cannot build
reasoners which will scale well in practice. 


Make sense? 

Phil



Re: Performance issues with OWL Reasoners

2006-09-15 Thread Phillip Lord

 KV == Kashyap, Vipul [EMAIL PROTECTED] writes:

   Not at doing DL reasoning. Relational databases do relational
   stuff well. For everything else, they are as likely to be rubbish
   as fast.

  KV [VK] Agreed! But the hypothesis is that mapping into a proven
  KV scalable technology such as an RDBMS, even if as a component
  KV helps build a scalable DL reasoner. 


The hypothesis is only going to be true IF the mapping is scalable. 
Otherwise, it doesn't work. 



  KV This is what is exciting about Instance store that it uses RDBMS
  KV and SQL queries as a subcomponent (generating candidate answers)
  KV and then does post-processing after that.

  KV In all the above examples, each of them had to obviously
  KV implement something extra, but RDBMS+SQL was leveraged as an
  KV important component.


There are two different things in the technologies you mentioned;
relational to X mapping tools, and metaschema approaches. They are
quite different. For the instance store, the relational database is
really an implementation detail. It's basically a reasoner with
somewhat limtied expressivity which is persistent and (hopefully)
scalable. 

Because it's using a metaschema approach, you can't do things like use
RDBMS security, for example (beyond yes/no). 

  KV I think the SW community should seek to leverage known scalable
  KV technologies to reach industrial strength scalability and
  KV performance.


Sure, would agree. 

Phil



Re: Performance issues with OWL Reasoners

2006-09-14 Thread Phillip Lord



 KV == Kashyap, Vipul [EMAIL PROTECTED] writes:

  KV OWL reasoners support two types of reasoning:

  KV 1. ABox reasoning (reasoning about instance data). Scalability
  KV here is being achieved here by leveraging relational database
  KV technology (which is acknowledged to be scalable) and mapping
  KV OWL instance reasoning operations to appropriate SQL queries on
  KV the underlying data store.

I may be wrong here, but as far as I know the expressivity of OWL-DL,
for example, is too different from that of RDBMS for this to work
completely. I am not enough of an expert to know if this sort of
mapping is possible at all or whether it just cannot be done
efficiently. 

Having said that there is a similar approach, which uses RDBMS. For
example, the instance store (http://instancestore.man.ac.uk) which I
was briefly involved with (before the backend got to hard for my poor
brain!), uses a metaschema backend. Queries are not made by mapping to
SQL, but using SQL and reasoner queries together. 


  KV 2. TBox reasoning scalability is a challenge, especially at the
  KVscale of 100s of
  KV thousands of classes found in medical ontologies. Would love to
  KV hear From DL experts on this issue.

Again, as far as I understand, the complexity of T-Box and A-Box
reasoning for logics such as that underlying OWL-DL are not that
different (i.e. they are both terrible!), so the issues are much the
same. 

There is not general answer to the size of a T-box you can reason
over. If the T-Box is a simple asserted hierarchy, you can build a
pretty large ontology (certainly 10's of thousands) and reason over
it -- the reasoning in this case being simple. If you start using lots
of more complex expressions then you can limit yourself a lot more. 

This paper for example, managed to get the Gene Ontology and, I think,
all of GOA into a DL form and reason over it in a, er, reasonable
amount of time. So scalability to 10's of thousands of T-box and 100's
of thousands of A-Box's is possible. 

http://www.cs.man.ac.uk/~dturi/papers/instancestore2.pdf

The DL reasoners are much better than they used to be -- in the good
old days, when the world was young, you could get most DL reasoners
to eat your CPU on a 10 term ontology. Nowadays, it's fairly hard to
do this. 

Phil




-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED]
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
Newcastle University,   Claremont Tower, Room 909
NE1 7RU



Re: A precedent suggesting a compromise for the SWHCLS IG Best Practices

2006-07-26 Thread Phillip Lord

 HST == Henry S Thompson [EMAIL PROTECTED] writes:

  HST With respect to the upcoming W3C Semantic Web Health Care and
  HST Life Sciences Interest Group f2f discussion of LSIDs, I wonder
  HST if you might think seriously about adopting an approach similar
  HST to that used by the ARK (Archival Resource Key) naming scheme
  HST [1].

  HST _Very_ roughly, this would involve Semantic Web uses of LSIDs
  HST to use an http-scheme version of LSIDs, along the following
  HST lines:

  HST  URN:LSID:rcsb.org:PDB:1D4X:22

  -- 

  HST  http://lsids.org/lsid:rcsb.org:PDB:1D4X:22

  HST or, alternatively, as per my recent suggestion to Sean

  HST  http://rcsb.org.lsids.org/lsid:PDB:1D4X:22

  HST I strongly recommend studying the ARK approach in any case, as
  HST it seems to me that although starting from a different subject
  HST area, its requirements are very close to your own.


I don't want to get domainist about this, but if it is broadly
similar can you give a quick outline as to why ARK is better than
LSIDs. 

I am starting to think that the main difficulty with LSIDs is that it
has the phrase Life Sciences in the title which makes it domain
dependant. 

My proposal is that we rename LSID to ARID for Archival Resource
ID. Would this solve the difficulties? 

Phil



Re: [BioRDF] All about the LSID URI/URN

2006-07-25 Thread Phillip Lord

 HST == Henry S Thompson [EMAIL PROTECTED] writes:

  HST Sean Martin writes:

  HST So, register one of lsids.org, lsids.net, lsids.name or
  HST lsids.info, and use e.g. http://lsids.or/xxx instead of
  HST URN:LSID:xxx.  Bingo -- no new tools required, works in all
  HST modern browsers :-).  

But if the file you are referencing is, say 5Tb, then it doesn't work
in a browser at all. With LSID's on the other hand, you may get back a
choice of methods to access the data, including one which can cope
with 5Tb of data. 

Incidentally, the approach that you are suggesting demonstrates that
LSIDs could be used in concert with URIs. In which case, putting
http://; into the LSID adds nothing. This is, in fact, exactly how
DOI's work. But the resolution through the doi.org proxy is a
convention which can be changed without changing the dois. 

Sean is entirely correct that encoding the protocol for the transport
layer and a DNS based resolution host into the identifiers is a recipe
for instability; maybe not a problem for many things, but a disaster
for many parts of bioinformatics. I do not still want to be doing
synonym resolution when I am 60.

What I have totally failed to understand about this discussion is why
it has been couched in terms of whether we should use LSIDs or
URIs. Of course, LSIDs non standard protocol in a pain, and the two
step resolution adds latency. But, this is the cost for circumventing
URIs protocol dependency which is also difficult. 

We should just circumvent this entire discussion about which
identifier is perfect for bioinformatics because I can tell you the
answer straight out. None of them. LSIDs answer a need. So, people use
them.

Cheers

Phil



Re: LSIDs and ontology segmentation

2006-07-14 Thread Phillip Lord


  Mark I know that others (e.g. Damian Gessler and collaborators at
  Mark NCGR, but I don't have the reference to his submitted
  Mark manuscript at hand right now... sorry Damian!) are also
  Mark working on the problem of segmentation by passing a
  Mark self-inflating flattened ontology fragment. 



I believe that the work your are referring to has been submitted to
the BioOntologies SIG at this years ISMB, the programme for which is
available online. 

http://www.jbb06.org/programme.html


This year we are holding a joint conference with BioLink, so this
should be a great workshop for anyone interested in semantic web,
ontologies or text mining and technologies in the life sciences. 


Tickets still available! Get them while they are hot!

Phil





Re: ontology specs for self-publishing experiment

2006-07-10 Thread Phillip Lord

 cm == chris mungall [EMAIL PROTECTED] writes:

   Converting between one syntax and another is fairly simple, and
   there are some reasonably tools for it. XSLT would work for
   converting XML into RDF. I wouldn't like to use it for converting
   the other way (actually I wouldn't like to use it at all, but
   this is personal prejudice!).
   
   This is assuming, however, that the semantics of the two
   representations are compatible. To give an example, syntactically
   it is possible to convert between the GO DAG and an OWL
   representation of GO. However, the GO part-of relationship
   doesn't distinguish universal and existential, while OWL forces
   you to make this distinction; you can't sit on the fence.

  cm Hi Phillip

  cm Actually GO uses the definition of part_of from the OBO relation
  cm ontology
  cm http://obo.sourceforge.net/relationship/#OBO_REL:part_of

  cm You can see from the definition that the use of this relation
  cm suggests an existential relation. The GO OWL transform encodes
  cm this, so you don't need to sit on the fence, the decision has
  cm been made for you.


Chris

My apologies. You are, of course, correct in saying that GO defines
an existential relationship, and I'm rather out of date in saying that
it doesn't (didn't!). 


  cm There are definitely some issues here in using the
  cm defined OBO relations (which involve time) to OWL (which has no
  cm explicit account of time)

Yeah, this is true. Time is a difficult one to model anyway, and OWL
doesn't help here. 



Phil



Re: ontology specs for self-publishing experiment

2006-07-10 Thread Phillip Lord



 AR == Alan Rector [EMAIL PROTECTED] writes:

  AR All

  AR Just catching up.

  AR Could I strongly support the following.  If there is one
  AR repeatedly confirmed lesson from the medical communities
  AR experience with large terminologies/ontologies/ it is to
  AR separate the terms from the entities.  There are always
  AR linguistic artefacts, and language changes more fluidly in both
  AR time and space than the underlying entities.  (In medical
  AR informatics this is sometimes quaintly phrased as using
  AR nonsemantic identifiers).



Not that I wish to disagree with Alan, of course, but it is worth
mentioning the reason that so many identifiers are semantically
meaningful in biology; they look better in papers. More over, because
they have some meaning associated with them, they are likely to be
used correct in papers as biologists will notice when they have the
wrong one. 

My own feeling is that the fly people got it right years ago. Their
gene identifiers had meaning, but not too much. So, for example,
sevenless is a mutant lacking the 7th cell in the eye. Clear, straight
forward and memorable. And if the world changes under you, the name
could be left the same because it doesn't really matter that much. 

Also, some of the names were quite amusing, although the sonic
hedgehog gag ran out years ago. 

Cheers

Phil




Re: scientific publishing task force update

2006-06-13 Thread Phillip Lord


 SC == Steve Chervitz [EMAIL PROTECTED] writes:

   They also wrote an interesting paper on the state of
   bio-ontologies.
   
   Nature Biotechnology 23, 1095 - 1098 (2005)
   doi:10.1038/nbt0905-1095 Are the current ontologies in biology
   good ontologies?
   
   Larisa N Soldatova  Ross D King

  SC Also worth seeing: The MGED ontologies folks wrote a response to
  SC this article that comments on the bio-ontology development
  SC process, and addresses some statements Soldatova and King make
  SC about MO which the MO folks feel are inaccurate or misleading:

  SC Stoeckert C et al.  Nature Biotechnology 24, 21 - 22 (2006)
  SC doi:10.1038/nbt0106-21b Wrestling with SUMO and bio-ontologies
  SC http://www.nature.com/nbt/journal/v24/n1/full/nbt0106-21b.html

Their paper did cause, how shall I say, somewhat of a stir. 


  SC The reliance on and choice of upper level ontology seems to be a
  SC big bone of contention. Are there any good reviews on these
  SC discussing things like why there are so many of them and why
  SC can't they be combined? Seems like the current trend is to
  SC accept their existence and work towards making them
  SC interoperable:


If I were being cynical (those of you who know me will know how rare
this is), I would suggest that it's a case of standards are so good,
that we need one each. 

The issue is a slightly deeper one in bio-ontologies. It's not clear
that an upper ontology actually brings significant value to the
table. The claimed advantage of interoperability between ontologies
is, to my mind, somewhat bogus; they only really allow
interoperability when you are querying over the concepts in the upper
ontology. Much more important is that they help to ease the design of
an ontology; you have more idea where concepts should go, so you can
spend more time worrying about the details of what ever you are
modelling and less about the big picture. 

On the flip side, they tend to complicate some stages of ontology
development, mostly notably the first month when you have lots of
biologists tearing their hair out trying to work out what a perjurant,
continuant, sortal, self-standing kind is. 

The juries still out in my opinion. 

Phil



CFP: The Joint BioLINK and 9th Bio-Ontologies Meeting

2006-05-02 Thread Phillip Lord



Due to difficulties some people have had with the submission system,
we have extended the deadline till the end of the week (5th May). 


Full details are at

http://www.jbb06.org/



-- 
Phillip Lord,   Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: [EMAIL PROTECTED]
School of Computing Science,
http://homepages.cs.ncl.ac.uk/phillip.lord
University of Newcastle,
NE1 7RU



Re: Ontology editor + why RDF?

2006-04-03 Thread Phillip Lord

 Anita == deWaard, Anita (ELS) [EMAIL PROTECTED] writes:
 
  Anita I am reminded of a saying on a Dutch proverb calendar: If
  Anita love is the answer, could you please repeat the question? If
  Anita semantics are the answer - what is the problem that is being
  Anita solved, in a way no other technology lets you? b

To be honest, I think that this is a recipe of despair; I don't think
that there is any one thing that SW enables you do to that could not
do in another way. It's a question of whether you can do things more
conveniently, or with more commonality than other wise; after all, XML
is just an extensible syntax and, indeed, could do exactly nothing
that SGML could not do (when it came out -- XML standards exceed SGML
ones now). XML has still been successful. 

It's more a question of whether, RDF or OWL provides a combination of
things that we would not get otherwise. With OWL (DL and lite), I
rather like the ability to check my model with a reasoner, and to be
able to apply the ontology automatically in some circumstances. With
RDF, you have a convenient technology for building a hyperlinked
resource, but with added link types. 

Of course, you could do the latter with straight XML (well, since RDF
is XML, you are doing so). And the former could be done without OWL,
just with a raw DL; of course, then you wouldn't get some of the
additional features of OWL (such as multi-lingual support which
derives directly from the XML). 

  Anita Perhaps if we can find a way to nail this down (I also
  Anita believe the use cases of this working group, and the group as
  Anita a whole is certainly working towards that aim!) we could try
  Anita to not just preach the semantic gospel, but
  Anita actually sell it (forgive the mixed metaphor)... 

Having said all that went before, I agree with this; having a set of
RDF/OWL life sciences success stories which explained why the
technology was appropriate (if not uniquely appropriate) would be a
good thing, if it has not been done before. 

Cheers

Phil