Re: [dev-biblio] an example of a odt file with the new citation xml

Bruce D'Arcus Sat, 14 Jan 2006 11:07:41 -0800


On Jan 14, 2006, at 9:54 AM, CPHennessy wrote:

I've made some very limited progress on parsing the biblio/cite infoin theexample file. Mainly I have now a very basic grasp of what I think Ineed to
do to at least parse the cite info.


Excellent!

However the two examples I have for the cite info differ a bit:
- is the cite info really contained within a text:bibliography-mark ?

Not from my perspective. I'd call that legacy. cite:citation = the newtext:bibliography-mark. We should fix that inconsistency.

- I cannot find a document on the OASIS site with the agreed newelements
        and attributes, is one available there ?


See here:

        http://lists.oasis-open.org/archives/office/200408/msg00030.html

The PDF David put up is just converted from that:

        http://bibliographic.openoffice.org/XML-bibliography-proposal.pdf

Here's the schema in RELAX NG Compact with comments (probably easier toread):


====

# Citation schema copyright Bruce D'Arcus, based on related work
# for DocBook with Peter Flynn and Markus Hoenicka, and for OpenOffice
# with Daniel Vogelheim. It has been approved by the OASIS OpenOffice
# TC for inclusion in the open file format.

default namespace cite = "http://purl.org/NET/xbiblio/cite/1.0";

start = citation-element

## A citation consists of two elements; one the structural source data
## and the other the presentational display.
citation-element = element cite:citation {
   citation-source-element,
   citation-body-element?
}

## The source element consists of one-or-more biblioref elements, which
## are the references within the citation. For example, the citation
## (Doe, 1999; Smith, 2000) contains two references.
citation-source-element = element cite:citation-source {
   biblioref-element+
}

## In fields across the social sciences and humanities, it is
## common for citations to include further detail, including
## more specific point citations (for example, page numbers)
## as well as captions. We allow more than one detail element
## to allow for coding such as (Doe, 1999: pages 1, 2, 3-5) or
## (Doe, 1999: page 2, paragraph 3).
biblioref-element = element cite:biblioref {
   detail-element*,
   caption-element*,
   biblioref-attlist
}
## Key is the pointer to the bibliographic record ID.
# note: add cite:style attribute to allow local styling

biblioref-attlist &= attribute cite:key { token }, attribute cite:style{ token }


## The detail element captures point citation information
## such as cited page numbers.
detail-element = element cite:detail { detail-attlist }
detail-attlist = attribute cite:begin { xsd:string },
   attribute cite:end { xsd:string }?,
   attribute cite:units {
     "chapters"
    | "figures"
    | "formulas"
    | "lines"
    | "pages"
    | "paragraphs"
    | "parts"
    | "sections"
}

## For examples such as (for more on this, see Doe, 1999).
caption-element = element cite:caption {
          caption-attlist,
          cite.text-content
        }
caption-attlist = attribute cite:position { "before" | "after" }

## The element to contain the rendered code is uncontrolled.
citation-body-element = element cite:citation-body { cite.text-content }

cite.text-content =
  element * {
    mixed {
      (cite.text-content
       | attribute * { text })*
    }
  }

===

I may suggest a minor change or two to bring it closer to currentmetadata directions, but they would be, as I say, minor.

- will there only be one style:Biblio element set ?
- what is the format of biblio-data.xml ?
- is the "Bibliographic Table" generated ? And if so there does notseem to beany reference in the table to cite elements. Does this then mean thatit doesnot need to be saved in the application memory, but only the fact thatthereis a biblio table with the various styles definitions needs to besaved ? (Ipresume so, and I think that nothing then needs to be implemented forthis as
all of this already exists).


Let me take the above three together, in reverse order.

Let's agree on terminology. I'm not sure about everyone else, but Ihave always found the term "bibliography table" very confusing. I'drather just call it a "reference list".

So we have citations (a short description that points to a fuller"reference"), references, and reference lists.

Yes, the reference list -- as well as the content of thecite:citation-body elements -- will be generated. Indeed, that's thewhole point of what we're doing; to make the formatting more flexibleand dynamic. A user adds a citation, and it gets formattedautomatically according to their chosen style.

Now, let me explain the process by which the formatting process oughtto work from a broad perspective.


1. collect citations
2. generate reference list

This step is critical. In CiteProc (my XSLT code), this creates anin-memory *enhanced* version of the reference metadata. So that list isproperly sorted and grouped, and that is used to generate someparameters to use later.


In some Ruby code I've been playing with, the process is the same.

So here is a simple example of a reference object before enhancement:

        #<Reference:0x22dc4
                @type="book",
                @year=2000,
                @creator=[#<Person: John Doe>],
                @bibparams={},
                @title="Another Title">

... and this after:

        #<Reference:0x22e50
                @type="book",
                @year=2000,
                @creator=[#<Person: John Doe>],
                @bibparams={:suffix=>"b", :first_by_creator=>false},
                @title="Some Title">

Note the "bibparams" attribute. That holds certain parameters that canonly be generated by considering each item in relation to all others.

In the author-year class of citations, that step groups and sorts byauthor-year. So the "first_by_creator" parameter simply says if thereis more than one item from a given author, this one is the first (aftersorting and grouping). It matters for formatting. The "suffix"parameter allows you to disambiguate duplicate author-year items, soyou have "Doe, 1999a" and "Doe, 1999b" instead of having both "Doe,1999".

Now, in other classes this processing logic would work a littledifferently. Consider, for example, a number style, where citationslook like [1]. In that style, you have two sorting options. One isauthor-year, where you do as above, and number them, and then use thatfor the citation.

Another is to sort in the order of citation. So there you actually needanother list I guess you could say (at least that's how it works in myXSLT) of citations, and then you add the number there to lookup in thereference list rendering step (see below).

In the author-year and note classes, BTW, this step also needs to keeptrack of the relative position of citations, so that you'd haveparameters like "first_citation" (to distinguish first/subsequent; thelatter often shortened) and "ibid."


3. format citations and reference list

You need step two to be able format both. This step would just read thecitation style, and then use that to format the objects (or XML nodes;whatever).

One question I have, though, is how to keep track of the position ofany given citation.


Moving on ...

What will the new bibliographic metadata look like?

I can provide examples of what I am advocating, but it's currently anopen question. At the TC we are discussing enhancing metadata support.I am pressing for ODF to have the same enhanced metadata framework forboth document content and content within documents, such asbibliographic stuff. This is a complicated issue, so it's going to takesome time to resolve.

Styling? To be honest, this is the least clear for me at the moment. Iwrote a citation style language (CSL) that does what we need, andperhaps we can prototype by using that independent of ODF's stylingsystem. I am contemplating writing a proposal to integrate its logicinto ODF though, but it's kind of hairy (but doable) figuring out howit all should fit together.


All of this is to get back to the bigger picture:

I'm not sure what you're perspective is CPH, but I wonder if we couldsay we're talking about two -- interacting -- pieces of code here: thecitation reader/writer, and the formatting processor.


The first one definitely has to be written in C++ ... now.

For prototyping purposes, though, the second does not. If it's too muchof a PITA to worry about now, perhaps Python would be a better bet(given the Python expertise on the list!)? We can't do that withoutother contributing though. I don't have the Python skills or the timeto do that.


Bruce

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [dev-biblio] an example of a odt file with the new citation xml

Reply via email to