On Jan 14, 2006, at 9:54 AM, CPHennessy wrote:
I've made some very limited progress on parsing the biblio/cite info
in the
example file. Mainly I have now a very basic grasp of what I think I
need to
do to at least parse the cite info.
Excellent!
However the two examples I have for the cite info differ a bit:
- is the cite info really contained within a text:bibliography-mark ?
Not from my perspective. I'd call that legacy. cite:citation = the new
text:bibliography-mark. We should fix that inconsistency.
- I cannot find a document on the OASIS site with the agreed new
elements
and attributes, is one available there ?
See here:
http://lists.oasis-open.org/archives/office/200408/msg00030.html
The PDF David put up is just converted from that:
http://bibliographic.openoffice.org/XML-bibliography-proposal.pdf
Here's the schema in RELAX NG Compact with comments (probably easier to
read):
====
# Citation schema copyright Bruce D'Arcus, based on related work
# for DocBook with Peter Flynn and Markus Hoenicka, and for OpenOffice
# with Daniel Vogelheim. It has been approved by the OASIS OpenOffice
# TC for inclusion in the open file format.
default namespace cite = "http://purl.org/NET/xbiblio/cite/1.0"
start = citation-element
## A citation consists of two elements; one the structural source data
## and the other the presentational display.
citation-element = element cite:citation {
citation-source-element,
citation-body-element?
}
## The source element consists of one-or-more biblioref elements, which
## are the references within the citation. For example, the citation
## (Doe, 1999; Smith, 2000) contains two references.
citation-source-element = element cite:citation-source {
biblioref-element+
}
## In fields across the social sciences and humanities, it is
## common for citations to include further detail, including
## more specific point citations (for example, page numbers)
## as well as captions. We allow more than one detail element
## to allow for coding such as (Doe, 1999: pages 1, 2, 3-5) or
## (Doe, 1999: page 2, paragraph 3).
biblioref-element = element cite:biblioref {
detail-element*,
caption-element*,
biblioref-attlist
}
## Key is the pointer to the bibliographic record ID.
# note: add cite:style attribute to allow local styling
biblioref-attlist &= attribute cite:key { token }, attribute cite:style
{ token }
## The detail element captures point citation information
## such as cited page numbers.
detail-element = element cite:detail { detail-attlist }
detail-attlist = attribute cite:begin { xsd:string },
attribute cite:end { xsd:string }?,
attribute cite:units {
"chapters"
| "figures"
| "formulas"
| "lines"
| "pages"
| "paragraphs"
| "parts"
| "sections"
}
## For examples such as (for more on this, see Doe, 1999).
caption-element = element cite:caption {
caption-attlist,
cite.text-content
}
caption-attlist = attribute cite:position { "before" | "after" }
## The element to contain the rendered code is uncontrolled.
citation-body-element = element cite:citation-body { cite.text-content }
cite.text-content =
element * {
mixed {
(cite.text-content
| attribute * { text })*
}
}
===
I may suggest a minor change or two to bring it closer to current
metadata directions, but they would be, as I say, minor.
- will there only be one style:Biblio element set ?
- what is the format of biblio-data.xml ?
- is the "Bibliographic Table" generated ? And if so there does not
seem to be
any reference in the table to cite elements. Does this then mean that
it does
not need to be saved in the application memory, but only the fact that
there
is a biblio table with the various styles definitions needs to be
saved ? (I
presume so, and I think that nothing then needs to be implemented for
this as
all of this already exists).
Let me take the above three together, in reverse order.
Let's agree on terminology. I'm not sure about everyone else, but I
have always found the term "bibliography table" very confusing. I'd
rather just call it a "reference list".
So we have citations (a short description that points to a fuller
"reference"), references, and reference lists.
Yes, the reference list -- as well as the content of the
cite:citation-body elements -- will be generated. Indeed, that's the
whole point of what we're doing; to make the formatting more flexible
and dynamic. A user adds a citation, and it gets formatted
automatically according to their chosen style.
Now, let me explain the process by which the formatting process ought
to work from a broad perspective.
1. collect citations
2. generate reference list
This step is critical. In CiteProc (my XSLT code), this creates an
in-memory *enhanced* version of the reference metadata. So that list is
properly sorted and grouped, and that is used to generate some
parameters to use later.
In some Ruby code I've been playing with, the process is the same.
So here is a simple example of a reference object before enhancement:
#<Reference:0x22dc4
@type="book",
@year=2000,
@creator=[#<Person: John Doe>],
@bibparams={},
@title="Another Title">
... and this after:
#<Reference:0x22e50
@type="book",
@year=2000,
@creator=[#<Person: John Doe>],
@bibparams={:suffix=>"b", :first_by_creator=>false},
@title="Some Title">
Note the "bibparams" attribute. That holds certain parameters that can
only be generated by considering each item in relation to all others.
In the author-year class of citations, that step groups and sorts by
author-year. So the "first_by_creator" parameter simply says if there
is more than one item from a given author, this one is the first (after
sorting and grouping). It matters for formatting. The "suffix"
parameter allows you to disambiguate duplicate author-year items, so
you have "Doe, 1999a" and "Doe, 1999b" instead of having both "Doe,
1999".
Now, in other classes this processing logic would work a little
differently. Consider, for example, a number style, where citations
look like [1]. In that style, you have two sorting options. One is
author-year, where you do as above, and number them, and then use that
for the citation.
Another is to sort in the order of citation. So there you actually need
another list I guess you could say (at least that's how it works in my
XSLT) of citations, and then you add the number there to lookup in the
reference list rendering step (see below).
In the author-year and note classes, BTW, this step also needs to keep
track of the relative position of citations, so that you'd have
parameters like "first_citation" (to distinguish first/subsequent; the
latter often shortened) and "ibid."
3. format citations and reference list
You need step two to be able format both. This step would just read the
citation style, and then use that to format the objects (or XML nodes;
whatever).
One question I have, though, is how to keep track of the position of
any given citation.
Moving on ...
What will the new bibliographic metadata look like?
I can provide examples of what I am advocating, but it's currently an
open question. At the TC we are discussing enhancing metadata support.
I am pressing for ODF to have the same enhanced metadata framework for
both document content and content within documents, such as
bibliographic stuff. This is a complicated issue, so it's going to take
some time to resolve.
Styling? To be honest, this is the least clear for me at the moment. I
wrote a citation style language (CSL) that does what we need, and
perhaps we can prototype by using that independent of ODF's styling
system. I am contemplating writing a proposal to integrate its logic
into ODF though, but it's kind of hairy (but doable) figuring out how
it all should fit together.
All of this is to get back to the bigger picture:
I'm not sure what you're perspective is CPH, but I wonder if we could
say we're talking about two -- interacting -- pieces of code here: the
citation reader/writer, and the formatting processor.
The first one definitely has to be written in C++ ... now.
For prototyping purposes, though, the second does not. If it's too much
of a PITA to worry about now, perhaps Python would be a better bet
(given the Python expertise on the list!)? We can't do that without
other contributing though. I don't have the Python skills or the time
to do that.
Bruce
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]