"Marco d'Itri" <[EMAIL PROTECTED]> writes: > On Jul 19, "Adam P. Harris" <[EMAIL PROTECTED]> wrote: >> I'd appreciate it if someone in the know about character sets and >> typical Debian application support for the same could comment on >> this spec and offer ways to make it better for multiple character >> sets, if
> Where can I find it? Doh! <URL:http://www.debian.org/~aph/debian-metadata.html/>. An ASCII copy is included herein. .....A. P. [EMAIL PROTECTED]<URL:http://www.onShore.com/> Debian Metadata Project ----------------------- Adam P. Harris <[EMAIL PROTECTED]> The Debian-Doc List <[email protected]> version 0.8.0, Sat, 18 Jul 1998 19:23:00 -0400 0.1 Abstract ------------ This manual contains a guide and a reference to the Debian Metadata Project. The Project's purpose, and the purpose of this document, is to outline a set of metadata elements, to specify an interface for package maintainers use in order to provide metadata about resources in their packages, to specify a unified subject catalog for categorizing metadata, and to specify an API for developers who wish to make use of a system's metadata. This manual is intended to serve as sub-policy for the deployment and utilization of metadata in Debian. Currently, it carries no actual force and is for informational purposes only. The manual is intended for both package maintainers, Debian document writers, and those implementing document display systems such as dwww and dhelp. 0.2 Contents ------------ 1. Introduction 1.1. Scope of this Document 1.2. Organization of this Document 1.3. Contributing to the Project 2. Local Configuration Options 2.1. Automatic Document Conversion 3. Debian Metadata Elements 3.1. Metadata Entities 3.2. Metadata Element Structure 3.3. Metadata Elements 4. docreg File Format 4.1. Design Rationale and Goals 4.2. How To Use the docreg File 4.3. docreg File Format 5. Tools for Maintainers 5.1. install-docs -- metadata installation and removal 5.2. validate-docreg -- metadata validation for maintainers 5.3. html2docreg -- convert HTML files to docreg 5.4. docreg2html -- convert docreg files to HTML 6. Debian Metadata for Implementors 6.1. Tracking Registered docreg Files 6.2. Augmented BNF Description for docreg Files 6.3. Hooking Into install-docs 0.3 Copyright Notice -------------------- Copyright �1998 Adam P. Harris, �1997 Christian Schwarz. This documentation is free software; you may redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. However, even though you are empowered to modify this specification, please do not do so; as a standard, it loses power if there are alternate versions of it available. Methods for centralized management and modification of this specification are outlined below. This manual is free software; you may redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. This is distributed in the hope that it will be useful, but *without any warranty*; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details. A copy of the GNU General Public License is available as `/usr/doc/copyright/GPL' in the Debian GNU/Linux distribution or on the World Wide Web at http://www.gnu.org/copyleft/gpl.html. You can also obtain it by writing to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. ------------------------------------------------------------------------------- 1. Introduction --------------- What is metadata? Metadata is information about information. The Debian Metadata Project is an attempt to provide a robust, standards-based metadata set, and the facilities to collect and display information about resources (usually, documents on a user's machine). Collected information includes the document's title, author, format, placement in a subject catalog, description, language, etc. Why should anyone care about metadata? Primarily, metadata is useful in *resource discovery*. This is the process of finding out where to find information. You do this every time you run man -k or apropos; Altavista (http://www.altavista.digital.com) and HotBot (http://www.hotbot.com/) are typical of the current technologies in resource discovery. But *metadata* allows you to find resources in different and better ways. You can search by title, by language, by author; you can traverse a subject heirarchy, like a book's index. Metadata allows a more intelligent was to organize and present the vast amount of documentation that Debian already provides. There are other benefits of having consistent metadata available. For instance, at document installation time, based on metadata, conversions may occur, or else fine-grained policies about what formats of documentation may be installed. Machines running Debian would be able to say things like, "if German and English versions of a document are available, remove the English version". Debian uses as their metadata entity definition a specialized application of the Dublin Core (http://purl.oclc.org/metadata/dublin_core/). The Dublin Core is an informal standard formulated by an international group of professions in the fields of library science, and the networking and digital library research communities. 1.1. Scope of this Document --------------------------- The purpose and scope of this document is to define a common baseline of metadata in Debian. Furthermore, this document is a manual meant to explain how to use metadata, for the benefit of curious users, package maintainers, or metadata integrators. As such, this manual covers the following issues: * what the recognized metadata elements are * how metadata is delivered * what tools are available to help work with Debian's metadata standard * how the system works, for the benefit of integrators A related document is the Debian Documentation Hierarchy manual, which defines the standardized documentation subject tree. That document, not included here, describes the headings and subheadings under which documents may appear. 1.2. Organization of this Document ---------------------------------- The document is split into three main sections. The first section contains information of interest to any Debian user, curious about the features and capabilities of our metadata system. The second section is of interest to package maintainers. The final section is mainly of interest to documentation system providers or metadata display system developers. System administration controls provided by the Debian Metadata system is documented in chapter 2, `Local Configuration Options'. chapter 3, `Debian Metadata Elements' defines the metadata elements, which are the data fields which can be populated for a given resource. The next part of this manual is primarily of interest to Debian package maintainers. It begins with chapter 4, `docreg File Format', which describes the "docreg" file, the file that the package maintainer uses to *register* document metadata into the local document store. Finally, in chapter 5, `Tools for Maintainers', the use of install-docs and other tools to assist package maintainers. The final part, chapter 6, `Debian Metadata for Implementors', is of interest to those who are working with Debian's metadata collection (implementors or integrators). This chapter contains a full BNF specification of docreg files, information on how developers can hook into install-docs for capturing certain metadata events, and information on the data provided for integrators by the doc-base system. 1.3. Contributing to the Project -------------------------------- Discussions about the Debian Metadata Project generally take place on the Debian-Doc mailing list <[email protected]>. This is an open project; all are invited. To subscribe to this list, see http://www.debian.org/MailingLists/subscribe. The newest version of the specification can be found, currently, at http://va.debian.org/~aph/debian-metadata.html/. This will be moving to a more standard location soon. If you are interested in contributing code or text to the specification, please do! Read-only CVS access to the specification is publicly available at `cvs.debian.org'. CVS access ensures that you have the most up-to-date versions of the documentation and implementation source. If you have a client/server capable `cvs' installed, do the following steps (note: the `>' represents your shell prompt, where you enter commands): > cvs -d :pserver:[EMAIL PROTECTED]:/cvs/doc-base login (Logging in to [EMAIL PROTECTED]) CVS password: <hit return, i.e., a blank password> > cvs -z9 -d :pserver:[EMAIL PROTECTED]:/cvs/doc-base co doc-base cvs server: Updating doc-base U doc-base/.cvsignore U doc-base/Makefile U doc-base/copyright.ent [...] If you are a developer or for some other reason have an account on `cvs.debian.org', you can also use a `CVSROOT' (the part after the `-d') of `:ext:<username>@cvs.debian.org:/cvs/doc-base'. For more information on how to use CVS, see cvs(1). ------------------------------------------------------------------------------- 2. Local Configuration Options ------------------------------ Providing knobs and dials for system administrators to control local documentation is possible once we have the data provided by the Debian Metadata scheme. None of this functionality is present yet; however, preliminary ideas of desirable configuration capabilities are discussed here. Such configuration possibilities can be categorized into a few major topics. The first topic is the ability to make decisions, based on local policy, whether or not to install the documentation. Here is a feature list: * don't install particular formats ever, i.e., "I don't want any PostScript on my machine, this is a firewall" * don't install particular languages, i.e., "I don't want any Spanish documentation installed". * conditionally, don't install a particular language, i.e., if another language is available, i.e., "if a Spanish version of a document is available already, we don't need the English version, otherwise, we do." 2.1. Automatic Document Conversion ---------------------------------- Another major topic is the possibility of auto-conversion of documentation, either on demand or at install time. Here is a possible feature list: * autoconvert on install based on format, i.e., "I want all SGML files to be converted into PDF, A4 sized paper. Please retain the SGML." * autoconvert on demand based on formats, i.e., provide a facility such that we could write a CGI to convert documents on demand, say, using content negotiation or user selection * "Even though policy says don't gzip HTML files, I've setup my browsers to handle it, so go ahead and gzip them." Autoconversion is a very complex issue. Packages being installed should be able of registering their conversion capabilities with the system. For example, sdc can translate a particular set of DTDs into HTML, ASCII, nroff, or PostScript. gs can translate PostScript to PDF. The `docbook-stylesheets' package can translate documents written in the Docbook DTD to HTML, PostScript, or RTF. When conversions are done, the system should make new metadata for them and register this new metadata, probably with special fields to allow an audit-trail of the conversion actions. Document formatting is a very complex issue. It can have dependencies on many different things in the system, such as fonts, obscure configuration settings, etc. For instance, if I change my paper-size in `/etc/papersize/', do I need to recreate any documents which depended on that setting? Additionally, we might need to allow a facility for the document manager to associate processing instructions for files. Finally, the logistics of package maintenance make autoconversion complex. Do we remove converted documents when the package from whence its source came is removed? when it is purged? ------------------------------------------------------------------------------- 3. Debian Metadata Elements --------------------------- This chapter contains a description Debian metadata, which is used to describe human-legible texts in a consistent and coherent way. The Debian Metadata Project uses the Dublin Core (http://purl.oclc.org/metadata/dublin_core/) set of metadata elements. Below we define logical structure of entities and elements, define how metadata relates to data, and describe the meaning and use of the elements individually. 3.1. Metadata Entities ---------------------- A metadata *entity* is composed of a set of *elements*, which are the individual bits of metadata. Every metadata entity describes one and only one *resource*, or document. However, a single resource may be described by more than one metadata entities. A *resource* is defined by a URL (generally a file in the documentation area of the package, on the local machine). One can conceptualize this system using a library card catalog paradigm. Resources are the actual books in the library (or periodicals, or microfiche, etc.). Metadata entities are the cards in the card catalog. Metadata elements are the actual bits of information appearing on these cards. A single book may have more than one card; furthermore, it may appear in different parts of the card catalog. 3.2. Metadata Element Structure ------------------------------- The Dublin Core Element semantics can be found at http://purl.oclc.org/metadata/dublin_core_elements. In some cases, we have restricted the syntax for the benefit of simplicity of implementation. These restrictions are always noted. Metadata elements consist of two required parts: a *label*, and its *content*. Labels are the name or label of an element, and are selected from the domain of the possible lables listed below. Contents are the value for the element. In standard Dublin Core, each element is repeatable. However, we have restricted the repeatability of certain fields for simplicity of implementation; these restrictions may be lifted at a later date. Generally, if an element's contents are not free text (i.e., if it doesn't make sense to talk of the *language* of the contents), we do not allow it to iterate. Elements may occur in any order. Order is never significant. Case is never significant in labels or qualifiers; case is preserved in the content. For the precise syntax of how elements are encoded in docreg files, see chapter 4, `docreg File Format'. The Debian flavor of Dublin Core also places restrictions on qualifiers. *Qualifiers* are attributes which attach to elements in order to additionally define, or *qualify*, what the element is or what it refers to. For instance, the LANG qualifier defines the language that the actual metadata is written in (not the resource). In the Debian Metadata scheme, we have eliminated the necessity (or even possibility) for metadata maintainers to use qualifiers. For instance, as the subject scheme, we have no use for Dewey Decimal schemes; instead, we require our own scheme. As such, the Debian scheme uses *required implied qualifiers*. Unknown or unacceptable schemes are ignored as if they never appeared. As such, we only deal with qualifiers when converting in and out of docreg formats into foreign formats, which have different meanings and purposes. 3.2.1. The LANG Qualifier ------------------------- The `LANG' qualifier indicates the language of the content of the element itself. For instances, if a `Description' element has a LANG qualifier value of <de>, the description itself is in German. Language qualifiers are not settable. For many elements, content is described in formal structure such as a date field or a URL. For other elements which use natural language (that is, "Title" and "Description"), there is an implied LANG qualifier which is the same as the setting of the Language element.[1] [1] This restriction may be lifted at some point; for more details see the "Language" element description. 3.2.2. The SCHEME Qualifier --------------------------- The `SCHEME' qualifier indicates what notational scheme the content of a given element is encoded in. Like all qualifiers, this qualifier is not available to the maintainer for manipulation. There is only one reasonable scheme for a given element in the Debian environment. However, knowing the scheme for an element is important so you know how the content of the element should be encoded. The default scheme is generally `free text'. Other elements have a scheme of `URL' or others, as described in section 3.3, `Metadata Elements'. 3.3. Metadata Elements ---------------------- In Debian Dublin Core, certain elements are required, some are optional, and some are ignored as insignificant. As a rule, the adage, "be liberal in what you accept and conservative in what you emit" applies to the system. The following is a summary of the elements, which are described in detail below: * Required elements * Identifier * Title * Subject * Format * Optional elements * Description * Language * Creator * Contributor * Publisher * Date * Source * Relation.IsFormatOf * Relation.IsBasedOn * Type * Rights * Ignored Elements * Coverage 3.3.1. Required Elements ------------------------ These elements are required. Lacking these elements constitutes an error which will cause install-docs to reject the entire entry. Identifier A URL used to uniquely identify the resource. Usually, the resource a local file on the user's file system (which may or may not be installed). In such cases, it would be beneficial for maintainers to be able to refer to the resource using a URL relative to a certain path. However, the actual path to be used is under debate. There are two proposed solutions: 1. If the URL is a relative URL, it is relative to the location of the packages documentation area. Namely, it is relative to either `file://localhost/usr/share/doc/<package>' or `file://localhost/usr/doc/<package>'. 2. If the URL is a relative URL, it is relative to the location of the docreg file itself. In order to resolve this system, the following scheme is temporarily adopted. If the URL starts with `./' it is considered to be relative to the position of the docreg file which contains this entity. If the URL is a normal relative URL, it is considered to be relative to the package documentation area as described above. This scheme is a temporary comprise in order to accommodate both sides of the debate; perhaps when we have actual implementations in place, one or the other shall win out. *Future directions.* We have perceived that it would be a good thing for certain documents to be identifiable by tokens which are less volatile than file names. Given this facility, our internal documentation could have persistant inter-document cross-references. The IETF-blessed facility to accommodate this purpose is URNs. URNs are unique tokens defined by a central authority (such as the Debian Documentation Project) to which the organization have made a long-term commitment to. For instance, the DDP might decide to create a URN `debian-doc:policy' to represent the Debian Policy document. To implement this system, we would need to setup a central naming authority to coordinate and maintain the Debian URN list. Associated with this list could be a set or URLs and/or URCs, such as "http://www.debian.org/debian-policy/index.html", mirrored locations, and even "the file index.html in the documentation area of the debian-policy package". Central, and centrally distributed (i.e., packaged) CGI scripts could be provided to dynamically interpret and support these URNs (i.e,. convert URNs to URLs on the fly). When and if this facility is in place, the Debian Metadata system can be used to implement it and to support it. However, it has been decided that the project should not at this time wait for that facility. SCHEME URL repeatable? no Title The title for the document, usually only a single line. If the document does not have a title, formulate the title as if it is the short selectable string of an HREF. The language that this field is expressed in must be the same as the language indicated in the "Language" element. SCHEME free text repeatable? no Subject Where this document is situated in the subject catalog. A subject catalog is a way of hierarchically organizing documents based on the subject matter covered by the document. For Debian, this Subject Catalog is the *Debian Document Hierarchy*, or DDH for short. See the Debian Documentation Hierarchy manual for specifics. SCHEME Debian, indicating the Debian Document Hierarchy repeatable? yes Format The format of the document, indicated as a MIME type, for example, `text/html'. SCHEME RFC 1522 etc (MIME) repeatable? no 3.3.2. Optional Elements ------------------------ These elements are optional. The content of these elements are captured by the system and should be displayed to the user by some means. Description A description, or abstract, for the resource. This gives the user more information about the resource, so that they are able to decide whether it contains the information they are looking for. The language that this field is expressed in must be the same as the language indicated in the "Language" element. *Future directions.* We may wish to define a subset of HTML elements to allow in the content of this element. For instance: `<bf>', `<em>', `<tt>', `<a href=...>', `<code>', `<p>', `<var>', SCHEME free text repeatable? no Language The language of the intellectual content of the resource. If this element is not present, it defaults to `en', for English. SCHEME RFC 1766 repeatable? no example `de' Creator The person or organization primarily responsible for creating the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources. SCHEME free text, or RFC 822 Address specification repeatable? yes example `A. P. Harris <[EMAIL PROTECTED]>' Contributor Contributor to a document. For our purposes, this should only be used to indicate the translator of a document. Multiple authors for a document should simply use multiple Creator elements. SCHEME free text, or email repeatable? yes example `A. P. Harris <[EMAIL PROTECTED]>' Publisher The element responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity. SCHEME free text, or email repeatable? yes Date A date associated with the resource. For our purposes, this should indicated the last modification date of a resource. SCHEME ISO 8601 Profile, found at http://www.w3.org/TR/NOTE-datetime-970915 repeatable? no example recommended to use only year-month-day granularity such as `1997-11-05' or `1997-11'; more granular formats such as `1997-07-16T19:20:30+01:00' are also available. Source Upstream location where a document originated. Generally this is a web site maintained by the document author, or the URL for a canonical upstream archive such as Sunsite. SCHEME URL repeatable? yes example `http://sunsite.unc.edu/mdw/HOWTOs/FOOBAR.html' Relation.IsFormatOf, Relation.IsBasedOn Indicates a relationship to another resource. The content of this field is the URL to the resource related to, as in the Identifier element. Relation.IsFormatOf indicates a format of the resource indicated in the content of this element, i.e., an HTML or ASCII version of an SGML file. Relation.IsBasedOn is used to indicated translations based on another document. Note that it is *not* an error for the content's URL to not exist on the users filesystem. *Future directions.* We ought to define a nice standard way to refer to files from other packages, i.e., file `index.html' from the documentation area of the package `foobar'. SCHEME URL repeatable? no example `FAQ/Linux-FAQ' Type The category of the resource, describing what sort of resource it is. Resource types are orthogonal to both the `Subject' and `Format' elements. So, the `Type' element can be thought of as the generic *class* of resources, which is not related to its particular media or subject matter. The value you select for this element should be selected from the following list if appropriate: * howto * faq * manual *(i.e., manual for software)* * reference * specification *(i.e., a Policy document)* * tutorial * figure * homepage * glossary * collection * discussion group *(i.e., a mailing list or Usenet group, or archives of the same)* * package *(for future use, representing Debian package metadata)* If the Type you are looking for is not on the list, but you feel it should be, please email <[EMAIL PROTECTED]>. SCHEME Debian type repeatable? yes example `howto' Rights An identifier that links to a rights management statement, such as acceptable terms of use, the GPL, etc. SCHEME URL repeatable? yes example `copyright/GPL' 3.3.3. Ignored Elements ----------------------- The following elements are ignored. They are mentioned here because these fields are part of standard Dublin Core; they may some day become supported. Coverage The spatial or temporal characteristics of the intellectual content of the resource. ------------------------------------------------------------------------------- 4. docreg File Format --------------------- The docreg file is the medium for the transmission of document metadata information to the local Document Store. As such, it is the package maintainer's way of attaching metadata to documents included in their package, and ensuring that metadata is available to the user who installed the package. The docreg file is used in combination with install-docs as the complete interface that a document-provide package needs to worry about. End users need not be aware of docreg files at all; they are not end-user editable. 4.1. Design Rationale and Goals ------------------------------- The docreg file is meant to be an easy, familiar mechanism for busy package maintainers. It uses a syntax similar to `control' files already used by package maintainers, namely an RFC-822 complaint syntax. The docreg file format has the following design goals: * Adherence to recognized metadata standards, namely, the Dublin Core (http://purl.oclc.org/metadata/dublin_core/) element set. * Easy to use for package maintainers; uses a very simple data model. * Language-independent syntax, allowing for indication of the language of the document, as well as indication of the language of the metadata. * Allow for flexibility and inter-relationships between documents without imposing any dependency or entity modeling complexity. 4.2. How To Use the docreg File ------------------------------- The docreg file itself is the file used by package maintainers to register documents into the Debian Document Registry. The doc-base packaging system (specifically the install-docs program) is responsible for processing the docreg file and adding the document's meta-information contained in the docreg file to the system's local Document Store. Document metadata is all the information contained in the Debian Document Registry for a file. The composition of this metadata is directly related to the docreg file, since the docreg file is the sole transmitter of document metadata into the registry (via install-docs). While it is easy to confuse the difference between the document metadata and the docreg file, there is a distinction. A docreg file may contain one or more *metadata entities*, as described in section 3.1, `Metadata Entities'. To extend the paradigm from that section, documents are the books in a library, metadata entities are the cards in the card catalog, and docreg files are simply bundles of one or more card catalog cards which are delivered to the library. While each metadata entity refers to one and only one resource (local or otherwise), it does not follow that each resource has one and only one bit of metadata.[1] It is possible, although unusual, that a resource may have more than one metadata entity referring to it. [1] It follows that the Identifier for a metadata entity is not necessarily a unique identifier for that entity. Documents can relate to one another in various ways. For instance, a document might be a specially formatted version of another source document (the "IsFormatOf" relation). A document might be a translation of another document into a new language ("IsBasedOn"), or, more obscurely, a version of the work, perhaps interesting for historical purposes ("IsVersionOf"). Relationships between documents do not require actual package dependencies, however. 4.2.1. Where To Put the docreg File ----------------------------------- docreg files are under package maintainer control; they are never altered by the Debian documentation system as a whole. The files should be installed and removed by the package itself using the standard means. The file may be automatically generated at the package maintainers discretion, however, it may not be altered after install-docs has run. At the convenience of the package maintainer, it is allowable to use more than one docreg file per package. docreg files may be placed in any location. It is suggested docreg files are placed in the directory containing the resource they describe. Moreover, the file should have `.docreg' as their suffix. However, maintainers may name or place documents whereever they wish. Alternatively, some have suggested `/usr/share/doc-base/docreg/' subdirectory. In which case, the docreg file should be named the same as the package, or prefixed with the same, i.e., `/usr/share/doc-base/docreg/debian-policy'. Whatever the file name, the names must be globally unique across all packages. Prefixing them with the package name helps ensure against collisions. 4.3. docreg File Format ----------------------- The format of the docreg file borrows from the Debian control file format, which borrows from RFC 822. First, some terminology. docreg files are composed of one or more metadata entities, where each entity describes a single document (URL, actually a file on disk). Metadata entities are composed of elements, or fields, which includes required elements, optional elements, and ignored elements. These elements are treated in depth in chapter 3, `Debian Metadata Elements'. Elements are lines composed of a label (that is, the name of the element), a colon (`:'), one or more optional qualifiers in parentheses, and finally the contents of the element. Elements are composed of elements separated by an empty line, or the top or bottom of the file. These controlled vocabularies are specified by the built in implied `SCHEME', which is described in subsection 3.2.2, `The SCHEME Qualifier'. Any element's contents may continue into multiple lines, but continuation lines must be indented from the left margin; this is called "folding". In some cases the contents are restricted to a controlled vocabulary, such as a URL, or a single value from a domain of possible values. An augmented BNF description of the file format, probably only of interest to implementors, can be found in section 6.2, `Augmented BNF Description for docreg Files'. 4.3.1. Example Files -------------------- The following is an example for the current document. There are three formats provided: SGML, ASCII, and HTML. Identifier: debian-metadata/debian-metadata.sgml Format: text/sgml Title: Debian Metadata Manual Subject: debian/policy Description: This manual contains a guide and a reference to the Debian Metadata Project. The Project's purpose, and the purpose of this document, is to outline a set of metadata elements, to specify an interface for package maintainers use in order to provide metadata about resources in their packages, to specify a unified subject catalog for categorizing metadata, and to specify an API for developers who wish to make use of a system's metadata. Language: en Creator: A. P. Harris <[EMAIL PROTECTED]> Creator: Marcus Brinkmann <[EMAIL PROTECTED]> Date: 1998-06-29 Rights: copyright/GPL Type: specification Identifier: debian-metadata/debian-metadata.html/index.html Format: text/html Title: Debian Metadata Manual Subject: debian/policy Description: This manual contains a guide and a reference to the Debian Metadata Project. The Project's purpose, and the purpose of this document, is to outline a set of metadata elements, to specify an interface for package maintainers use in order to provide metadata about resources in their packages, to specify a unified subject catalog for categorizing metadata, and to specify an API for developers who wish to make use of a system's metadata. Language: en Creator: A. P. Harris <[EMAIL PROTECTED]> Creator: Marcus Brinkmann <[EMAIL PROTECTED]> Date: 1998-06-29 Rights: copyright/GPL Type: specification Identifier: debian-metadata/debian-metadata.text Format: text/plain Title: Debian Metadata Manual Subject: debian/policy Description: This manual contains a guide and a reference to the Debian Metadata Project. The Project's purpose, and the purpose of this document, is to outline a set of metadata elements, to specify an interface for package maintainers use in order to provide metadata about resources in their packages, to specify a unified subject catalog for categorizing metadata, and to specify an API for developers who wish to make use of a system's metadata. Language: en Creator: A. P. Harris <[EMAIL PROTECTED]> Creator: Marcus Brinkmann <[EMAIL PROTECTED]> Date: 1998-06-29 Rights: copyright/GPL Type: specification As the reader can see, there is a lot of repetition between the different elements. Therefore, it is suggested that docreg files take advantage of a preprocessor, such as m4. Here is a much shorter version of the docreg file, which is processed by m4 to make the above entries: changequote([, ])dnl define([common_elements], [Title: Debian Metadata Manual Subject: debian/policy Description: This manual contains a guide and a reference to the Debian Metadata Project. The Project's purpose, and the purpose of this document, is to outline a set of metadata elements, to specify an interface for package maintainers use in order to provide metadata about resources in their packages, to specify a unified subject catalog for categorizing metadata, and to specify an API for developers who wish to make use of a system's metadata. Language: en Creator: A. P. Harris <[EMAIL PROTECTED]> Creator: Marcus Brinkmann <[EMAIL PROTECTED]> Date: 1998-06-29 Rights: copyright/GPL Type: specification])dnl Identifier: debian-metadata/debian-metadata.sgml Format: text/sgml common_elements Identifier: debian-metadata/debian-metadata.html/index.html Format: text/html common_elements Identifier: debian-metadata/debian-metadata.text Format: text/plain common_elements 4.3.2. Field Sizes ------------------ Field size limits are imposed on fields in order to facilitate a straight-forward database-driven storage system (not yet in place). Identifier 256 Title 80 Subject 160 (multiple elements combined) Format 40 Description 512 Language 2 Creator 200 (multiple elements combined) Contributor 200 (multiple elements combined) Publisher 200 (multiple elements combined) Date 10 Source 256 (multiple elements combined) Relation.IsFormatOf 256 Relation.IsBasedOn 256 Type 80 (multiple elements combined) Rights 256 Coverage 80 ------------------------------------------------------------------------------- 5. Tools for Maintainers ------------------------ Maintainer tools are applications which are available to maintainers to assist them in managing metadata which they are responsible for. This chapter contains an overview of the available tools; for full information about these applications, please see the manual pages provided for the programs. 5.1. install-docs -- metadata installation and removal ------------------------------------------------------ install-docs is used from the package maintainer scripts to install or remove a docreg file from the local store of registered metadata. *Examples of how to invoke from maintainer scripts.* Metadata integrators can extend install-docs functionality by using the techniques described in section 6.3, `Hooking Into install-docs '. It is part of the standard doc-base package. 5.2. validate-docreg -- metadata validation for maintainers ----------------------------------------------------------- More extensive validation can be used by package maintainers to ensure that their metadata is well formed. Examples of the validation done: * Relations actually exist (would depend whether the package related to is installed). * Translations of a document use same subject as the metadata of what they've translated from. * Validate fields, such as date, language, etc. *Examples of how to invoke from debian/rules.* This utility is part of the doc-base-dev package. 5.3. html2docreg -- convert HTML files to docreg ------------------------------------------------ Converts standard Dublin Core HTML `META' tags into docreg syntax. Supports both HTML v3 and v4 META syntax. This utility is part of the doc-base-dev package. 5.4. docreg2html -- convert docreg files to HTML ------------------------------------------------ Converts standard docreg files to Dublin Core standard HTML `META' elements. Can switch between HTML v3 and v4 META syntax. Qualifiers are added automatically. This utility is part of the doc-base-dev package. ------------------------------------------------------------------------------- 6. Debian Metadata for Implementors ----------------------------------- This chapter is for those who are implementing interfaces or extensions to the Debian Metadata infrastructure. 6.1. Tracking Registered docreg Files ------------------------------------- Currently, the only means by which an implementor can access metadata is through the docreg files themselves. The Debian Metadata project feels that direct access to docreg files is a temporary state of affairs, since direct access to docreg files will ultimately place too many constraints on the file format and contents. For instance, moving to XML, or offering multiple docreg file format, would be doubly difficult to implement. Abstraction is needed between the "container" file (transmitting the information) and the "store" of locally known metadata. However, delaying implementation until we have a local storage system is not acceptable, since we want to get the system to be actually used in the world before investing too deeply in an infrastructure. Therefore, for now, implementors should use the file `/var/state/doc-base/registered-docreg-files' to discover the list of installed docreg files. 6.2. Augmented BNF Description for docreg Files ----------------------------------------------- The following description uses augmented BNF as defined in RFC 822. This standard meta-format lets us define the docreg format without ambiguity. See also RFC 2068 for a description and example of augmented BNF. 6.2.1. Basic Rules ------------------- The following rules define fundamental building blocks used in the rest of this specification. CHAR = <any ASCII character> ; ( 0-177, 0.-127.) ISOCHAR = <any ISO-8859-1 character> CTL = <any ASCII control ; ( 0- 37, 0.- 31.) character and DEL> ; ( 177, 127.) LF = <ASCII LF, linefeed> ; ( 12, 10.) SPACE = <ASCII SP, space> ; ( 40, 32.) HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.) LWSP-char = SPACE / HTAB ; semantics = SPACE linear-white-space = 1*([LF] LWSP-char) ; semantics = SPACE ; LF => folding specials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / <"> / "." / "[" / "]" "=" atom =1*<any CHAR except specials, SPACE and CTLs> ; control fields ctext = *<any ISOCHAR excluding "(", ; field contents ")", "\" & CR, & including linear-white-space> end-of-rec = < 2*LF or end of file > 6.2.2. Field Definitions ------------------------- Field semantics are the same as defined as "Header Field Definitions" in RFC 822 Section 3.1, with the exception that rather than CRLF we use the standard Unix line separator, LF. Long header fields are likewise supported, as specified in RFC 822 Section 3.1.1. The following is the BNF composition of docreg fields syntax. field = field-name ":" [*field-qualifier] \ field-body LF LF field-name = *atom field-body = field-body-contents [LF LWSP-char field-body] ; folding field-body-contents = *ctext field-qualifier = "(" *atom "=" *atom ")" `field-names' are not case-sensitive. Both `field-names' and `field-qualifier' are further constrained to the set of allowable values. Furthermore, in some cases, `field-contents' are constrained based on their qualifiers. For instance, a qualifier of `SCHEME=URL' would indicate that the contents should be a valid URL. For clarifications on the way that fields are composed, refer to RFC 822.[1] [1] Please email me with any corrections or clarifications. 6.2.3. docreg Definition ------------------------- docreg files contain any number of metadata sets. docreg-file = *metadata-set metadata-set = *fields end-of-rec 6.3. Hooking Into install-docs ------------------------------- This section specifies a proposed method of allowing packages to hook into install-docs invocations. It is not yet decided whether this functionality is necessary. If you are a metadata implementor, and you find that you do need this functionality, or find that this functionality is not sufficient for your needs, then please email <[EMAIL PROTECTED]>. Metadata implementors can *hook* into state changes in metadata by providing scripts in `/usr/share/doc-base/methods/' which are executable. These hooks must be platform-independant (or else maybe we should move this to `/usr/lib/doc-base/methods'). Most likely they will be wrappers around the actual programs. The following states have hooks; the hooks are indicated by the first argument passed to the scripts. install <docreg_file> call to register or re-register the docreg file located at <docreg_file> remove <docreg_file> call to unregister the docreg file located at <docreg_file> rebuild rebuild local caches of metadata; essentially this implies clearing out all data and reinstalling each of the registered docreg files ------------------------------------------------------------------------------- Debian Metadata Project Adam P. Harris <[EMAIL PROTECTED]>, The Debian-Doc List <[email protected]> - version 0.8.0, Sat, 18 Jul 1998 19:23:00 -0400 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

