On Jan 14, 2006, at 9:54 AM, CPHennessy wrote:

I've made some very limited progress on parsing the biblio/cite info in the example file. Mainly I have now a very basic grasp of what I think I need to
do to at least parse the cite info.

Excellent!

However the two examples I have for the cite info differ a bit:
- is the cite info really contained within a text:bibliography-mark ?

Not from my perspective. I'd call that legacy. cite:citation = the new text:bibliography-mark. We should fix that inconsistency.

- I cannot find a document on the OASIS site with the agreed new elements
        and attributes, is one available there ?

See here:

        http://lists.oasis-open.org/archives/office/200408/msg00030.html

The PDF David put up is just converted from that:

        http://bibliographic.openoffice.org/XML-bibliography-proposal.pdf

Here's the schema in RELAX NG Compact with comments (probably easier to read):

====

# Citation schema copyright Bruce D'Arcus, based on related work
# for DocBook with Peter Flynn and Markus Hoenicka, and for OpenOffice
# with Daniel Vogelheim. It has been approved by the OASIS OpenOffice
# TC for inclusion in the open file format.

default namespace cite = "http://purl.org/NET/xbiblio/cite/1.0";

start = citation-element

## A citation consists of two elements; one the structural source data
## and the other the presentational display.
citation-element = element cite:citation {
   citation-source-element,
   citation-body-element?
}

## The source element consists of one-or-more biblioref elements, which
## are the references within the citation. For example, the citation
## (Doe, 1999; Smith, 2000) contains two references.
citation-source-element = element cite:citation-source {
   biblioref-element+
}

## In fields across the social sciences and humanities, it is
## common for citations to include further detail, including
## more specific point citations (for example, page numbers)
## as well as captions. We allow more than one detail element
## to allow for coding such as (Doe, 1999: pages 1, 2, 3-5) or
## (Doe, 1999: page 2, paragraph 3).
biblioref-element = element cite:biblioref {
   detail-element*,
   caption-element*,
   biblioref-attlist
}
## Key is the pointer to the bibliographic record ID.
# note: add cite:style attribute to allow local styling
biblioref-attlist &= attribute cite:key { token }, attribute cite:style { token }

## The detail element captures point citation information
## such as cited page numbers.
detail-element = element cite:detail { detail-attlist }
detail-attlist = attribute cite:begin { xsd:string },
   attribute cite:end { xsd:string }?,
   attribute cite:units {
     "chapters"
    | "figures"
    | "formulas"
    | "lines"
    | "pages"
    | "paragraphs"
    | "parts"
    | "sections"
}

## For examples such as (for more on this, see Doe, 1999).
caption-element = element cite:caption {
          caption-attlist,
          cite.text-content
        }
caption-attlist = attribute cite:position { "before" | "after" }

## The element to contain the rendered code is uncontrolled.
citation-body-element = element cite:citation-body { cite.text-content }

cite.text-content =
  element * {
    mixed {
      (cite.text-content
       | attribute * { text })*
    }
  }

===

I may suggest a minor change or two to bring it closer to current metadata directions, but they would be, as I say, minor.

- will there only be one style:Biblio element set ?
- what is the format of biblio-data.xml ?
- is the "Bibliographic Table" generated ? And if so there does not seem to be any reference in the table to cite elements. Does this then mean that it does not need to be saved in the application memory, but only the fact that there is a biblio table with the various styles definitions needs to be saved ? (I presume so, and I think that nothing then needs to be implemented for this as
all of this already exists).

Let me take the above three together, in reverse order.

Let's agree on terminology. I'm not sure about everyone else, but I have always found the term "bibliography table" very confusing. I'd rather just call it a "reference list".

So we have citations (a short description that points to a fuller "reference"), references, and reference lists.

Yes, the reference list -- as well as the content of the cite:citation-body elements -- will be generated. Indeed, that's the whole point of what we're doing; to make the formatting more flexible and dynamic. A user adds a citation, and it gets formatted automatically according to their chosen style.

Now, let me explain the process by which the formatting process ought to work from a broad perspective.

1. collect citations
2. generate reference list

This step is critical. In CiteProc (my XSLT code), this creates an in-memory *enhanced* version of the reference metadata. So that list is properly sorted and grouped, and that is used to generate some parameters to use later.

In some Ruby code I've been playing with, the process is the same.

So here is a simple example of a reference object before enhancement:

        #<Reference:0x22dc4
                @type="book",
                @year=2000,
                @creator=[#<Person: John Doe>],
                @bibparams={},
                @title="Another Title">

... and this after:

        #<Reference:0x22e50
                @type="book",
                @year=2000,
                @creator=[#<Person: John Doe>],
                @bibparams={:suffix=>"b", :first_by_creator=>false},
                @title="Some Title">

Note the "bibparams" attribute. That holds certain parameters that can only be generated by considering each item in relation to all others.

In the author-year class of citations, that step groups and sorts by author-year. So the "first_by_creator" parameter simply says if there is more than one item from a given author, this one is the first (after sorting and grouping). It matters for formatting. The "suffix" parameter allows you to disambiguate duplicate author-year items, so you have "Doe, 1999a" and "Doe, 1999b" instead of having both "Doe, 1999".

Now, in other classes this processing logic would work a little differently. Consider, for example, a number style, where citations look like [1]. In that style, you have two sorting options. One is author-year, where you do as above, and number them, and then use that for the citation.

Another is to sort in the order of citation. So there you actually need another list I guess you could say (at least that's how it works in my XSLT) of citations, and then you add the number there to lookup in the reference list rendering step (see below).

In the author-year and note classes, BTW, this step also needs to keep track of the relative position of citations, so that you'd have parameters like "first_citation" (to distinguish first/subsequent; the latter often shortened) and "ibid."

3. format citations and reference list

You need step two to be able format both. This step would just read the citation style, and then use that to format the objects (or XML nodes; whatever).

One question I have, though, is how to keep track of the position of any given citation.

Moving on ...

What will the new bibliographic metadata look like?

I can provide examples of what I am advocating, but it's currently an open question. At the TC we are discussing enhancing metadata support. I am pressing for ODF to have the same enhanced metadata framework for both document content and content within documents, such as bibliographic stuff. This is a complicated issue, so it's going to take some time to resolve.

Styling? To be honest, this is the least clear for me at the moment. I wrote a citation style language (CSL) that does what we need, and perhaps we can prototype by using that independent of ODF's styling system. I am contemplating writing a proposal to integrate its logic into ODF though, but it's kind of hairy (but doable) figuring out how it all should fit together.

All of this is to get back to the bigger picture:

I'm not sure what you're perspective is CPH, but I wonder if we could say we're talking about two -- interacting -- pieces of code here: the citation reader/writer, and the formatting processor.

The first one definitely has to be written in C++ ... now.

For prototyping purposes, though, the second does not. If it's too much of a PITA to worry about now, perhaps Python would be a better bet (given the Python expertise on the list!)? We can't do that without other contributing though. I don't have the Python skills or the time to do that.

Bruce

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to