Note concerning the type hierarchy

At our recent meeting in Paris there was some discussion about the role of the 
type hierarchy, the numbering of types and the appropriate methodological 
principles to apply. These notes are intended to present my understanding of 
these issues.

The type hierarchy contains four sorts of types:

   As already noted elsewhere [Doerr & Crofts 1999], the type hierarchy 
implicitly contains a 'redundant' declaration of all types corresponding to the 
entities (classes) present under "E1 CIDOC Entity". These implicit type 
declarations also duplicate the hierarchical structure of the existing entity 
hierarchy. In Paris, I put forward the proposal that implicit types should be 
prefixed with the letter 'T', and use the same numbering as their corresponding 
entity. "E5 Event", for example, would correspond to "T5 Event". This 
proposition has not yet been adopted, but I should use this form of notation in 
the remarks which follow.
   The type hierarchy may also contain additional sub types of the implicit 
types, thereby providing a higher degree of granularity than is expressed by 
the basic entity hierarchy. For example, a subtype "Coins" could be declared 
for "T24 Man-Made Object". I am not sure what rules should be adopted for 
numbering these sub types since they are assumed to be domain-oriented and 
specific to local systems. For present purposes I shall use a decimal notation 
such as "T24.1 Coin". 
   The third sort of types present in the type hierarchy are descendants of 
type hierarchies which do not corresponding to any entity in the entity 
hierarchy, such as 'language' and 'material'. The head types of these type 
hierarchies are currently assigned an 'E' number. This effectively avoids any 
possible conflict with the numbering of the entity hierarchy. However, in order 
to highlight their nature as 'types' I propose to adopt the same 'T' prefix as 
for other types. (I would argue that this approach is consistent in that it 
suggests the existence of an 'implicit' entity in the entity hierarchy, one 
that could be declared explicitly at a later date if required.). I shall refer 
to these type hierarchies as 'floating types' since are not be assigned to any 
position in the entity hierarchy.
   The head type of the type hierarchy is E55 Type. This exists in the main 
entity hierarchy so it can correctly be referred to using the 'E' prefix. 
However, since it also represents the highest type in the hierarchy of implicit 
types, it could also be considered as equivalent to "E1 CIDOC Entity", and 
might therefore be numbered 'T1 CIDOC Entity". Alternatively, T1 might be 
declared as a sub type of E55 Type. This point needs further clarification.

The distinction between the 'E' hierarchy and the 'T' hierarchy is essentially 
technical. The Entity hierarchy consists of classes and can be considered as a 
structural hierarchy. In a relational database it would naturally be 
implemented as a set of tables. The Type hierarchy consists of instances and 
can be considered as a set of data values. It would naturally be implemented as 
a single table containing values. Despite these differences, the two 
representations are intended to be logically equivalent. Each entity in the E 
hierarchy corresponds to a type in the T hierarchy. When no corresponding type 
exists for a given entity it is merely for reasons of economy; the undeclared 
type can be considered as implicit. Conversely, a type for which has no 
corresponding entity exists in the E hierarchy corresponds to an implicit 
entity, which could be declared at a later stage if needed.

A consequence of the logical equivalence of the entity hierarchy and the type 
hierarchy is that the methodological rules which apply to the declaration of 
entities and sub entities also apply to types and subtypes. The 'Isa' rule 
should be applicable to both, so it should be possible of any sub-entity to say 
that "sub-entity X isA(n) entity Y", where Y is a super-entity of X. Thus, a 
"Person (E21) is a Physical Entity (E18)". Similarly, it should be possible for 
any subtype to say that "subtype X isA(n) type Y", where Y is a super-type of 
X. Following my previous example: "a Coin (T24.1) is a Man-Made Object (T24)". 
It follows from this that we should also be able to make assertions of the form 
"subtype X isA(n) entity Y" where X is a subtype of type Y', corresponding to 
entity Y. e.g. "A coin (T24.1) is a Man-Made Object (E24)". We should bear in 
mind that the isA rule is intended to be inclusive: it is true iff any and all 
members of a sub-category are also members of the super-category. Pasta, for 
example is not a good specialisation of 'Italian food', since some pastas are 
not Italian. 

When do we declare subtypes without a corresponding entity? 

A subtype should be declared for a en existing implicit type (i.e. one which 
corresponds to an existing entity) iff it is needed to register a domain 
specific notion which would otherwise not be recorded and if it does not 
require properties in addition to those it inherits from the existing entity. 
If additional properties are required, a sub-entity would have to be declared. 

A subtype should be declared either directly under E55 or as part of a type 
hierarchy so declared (i.e. there is no corresponding entity) iff it 
corresponds to an implicit entity for which no properties are required in the 
scope of the CRM. This is assumed to be the case for E56 (T56) language, for 
example. No entity is declared in the CRM to represent language since we have 
no properties to record about languages other than their identity: the CRM does 
not describe or talk about languages. If this situation changes in the future, 
we would need to declare a 'language' entity in the entity hierarchy, and 
attach the relevant properties. 

When to use the 'has type' property?

All entities declared in the CRM have a 'has type' property. This enables 
instances of entities to be declared as belonging to a given sub types. The 
logical equivalence of the types and entities means that assigning a sub type 
to an entity instance is logically equivalent to declaring it as an instance of 
an implicit sub-entity (a specialisation). It follows that one should only 
assign sub types which follow the 'isA' rule, i.e. where subtype Y isA type X 
and X is the type corresponding to entity X'. (This constraint may be 
represented in the property declarations using the appropriate T number) 
Furthermore, we can say that types assigned to the has type property should not 
be 'floating types', since these are not yet declared in the entity hierarchy 
and consequently cannot follow the isA rule. No instance of an entity in the 
CRM should have type 'language', for example, for this would imply that 
language is a specialisation of the entity to which the instance belongs. (If 
it can be established, that "language" is indeed a specialisation of some 
existing entity, then it should be reclassified.) 

When to use other property links to the type hierarchy?

Some entities have additional properties which link into the type hierarchy. 
"E33 Linguistic Object", for example, has a property "has language (is language 
of): T56 Language". Property links of this sort can be seen as logically 
equivalent to links to entities. The fact that language is currently declared 
as a type, and not as an entity, reflects the fact that, as it stands, the CRM 
has very little to say about languages. It follows that the type to which the 
property link refers should not be a subtype of the entity. If it is, then the 
"has type" property should be used instead. In the current example this rule 
holds, it is not the case that a language is a linguistic object (as defined in 
the CRM). 

In the light of the foregoing remarks, I would argued that the 'has gender' 
property of "E21 Person" should be maintained and should not be handled using 
the inherited "has type" property. Gender is not a good specialisation of 
Person, since many male, female and somewhere-in-between objects are not 
persons, inversely, many men and women are not fish. The CRM does not currently 
have a specific class for animals, other than E20 Biological Object, (which 
could also be taken to include plants, bacteria and biological material). 
However, something like an 'animalia' subclass will need to be included at some 
stage to meet the needs of natural history collections. This would be a natural 
place for the 'has gender' property. 

Open questions

How should types such as 'language' be declared, which could be positioned in 
the entity hierarchy? Leaving them as 'floating' types suggests that we don't 
know how to classify them correctly. Would it be true to say that, in 
principle, all floating types could be placed in the entity hierarchy?

The entity hierarchy leads down to, but does not include, instances of 
entities. The type hierarchy leads down to, but also appears to include, 
instance level types. This asymmetry seems to be required, for example, for 
'E56 Language' and 'E57 Material'. The type hierarchies fail to do their job if 
they don't include entries like 'French' and 'Wood'. Do we need to make this 
distinction clear at a theoretical level, and possibly by differentiating the 
instance level data in the type hierarchy in some way? Failure to make the 
distinction might lead to problems if property links are made to "class level" 
types rather than instances.

Types can have many names. Common names, scientific names, original names, etc. 
Should we give types an 'is identified by' link to appellation? (Incidentally, 
should appellation have a 'has value' property link to string?)

Types are created by human beings, who can often be named. This is notably the 
case with biological taxonomy. What is the relationship between the type 
hierarchy and E28 Conceptual Object ? (This could open up a whole can of 
biological objects...)

I hope some of the foregoing makes sense. 

best wishes

Nick Crofts

References

[Doerr & Crofts 1999] Electronic Esperanto: The Role of the Object Oriented 
CIDOC Reference Model , Proc. of the ICHIM'99, Washington, DC, September 22-26, 
1999

URL : http://cidoc.ics.forth.gr/docs/doerr_crofts_ichim99_new.doc


Nicholas Crofts
DAEL / DSI
rue David-Dufour 5
Case postale 22
CH - 1211 Genève 8
tél +41 22 327 5271
fax +41 22 328 4382


---------------------------------
Nokia Game is on again. 
 Click here  to join the new all media adventure before November 3rd. 

Reply via email to