Lech Rzedzicki wrote:
We're trying to keep our markup close to DB5 but we also want to
tighten the schema a bit further.
One area we're particularly struggling with is phrase books and
dictionaries. This was originally modelled using TEI and reflects the
actual structure quite well.
The problem we have is that both in the original language portion
(form) and in the the target language explanation (sense) we need to
allow many optional elements such as example, pronunciation, often
multiple times (as there can be many forms or senses or many examples
for each sense or form), gradually this led us to a very complex and
loose model which also doesn't maintain the relationship between the
original and translation too well.
I was wondering if any of you have any experience dealing with similar
content and whether you could share your experience and schemas?
We are working a lot with XML-based bilingual dictionaries (not phrase
books, although they may be similar). I think the bottom line is, don't
use DocBook for dictionaries (at least not for the body of the
dictionary, i.e. all the entries). It just isn't the same kind of
structure.
TEI-encoded dictionaries tend to reflect the structure of the print
dictionary from which the electronic form was derived. That has a
couple advantages:
1) It's easy(-er) to convert from the print form to the electronic form,
and go back later and make sure you did it right
2) It makes producing a new print copy of the dictionary that looks like
the original print dictionary easy(-er).
It also has some disadvantages:
1) Unless you're working with a bunch of similar dictionaries from a
single publisher, you're likely to wind up with a large number of
schemas (or DTDs), one for each dictionary, and that can be hard to manage.
2) The large number of schemas in (1) also means that you probably have
to write a different CSS (or whatever you use) for each one.
3) You're limited to a single presentation form, i.e. it is difficult to
display a root-based dictionary as a stem-based dictionary.
What we (and probably most people who work with multiple electronic
dictionaries) do instead, is to use a generic lexicon schema. This
flattens the overall structure of a typical print dictionary (e.g.
subentries become entries on their own); the original structure is
instead represented by xrefs (so a sub-entry and a minor entry both have
pointers back to the main entry). One can then postpone until run-time
decisions like root-based vs. stem-based presentation, or whether a
given minor entry is displayed as a sub-entry or as an entry on its own
(and perhaps alphabetized on its own, if that's relevant to the
electronic display). The run-time decisions are then implemented using
one of two (or several) style sheets.
More than that about this approach (as opposed to doing something with
dictionaries inside DocBook) probably doesn't belong on this list.
Fortunately there are lexicography mailing lists, e.g. the Lexicography
list (see http://linguistlist.org/lists/get-lists.cfm).
--
Mike Maxwell
What good is a universe without somebody around to look at it?
--Robert Dicke, Princeton physicist
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]