Re: [docbook-apps] Capturing phrase books and dictionaries

Mike Maxwell Thu, 18 Mar 2010 09:11:08 -0700

Lech Rzedzicki wrote:

We're trying to keep our markup close to DB5 but we also want to
tighten the schema a bit further.
One area we're particularly struggling with is phrase books and
dictionaries. This was originally modelled using TEI and reflects the
actual structure quite well.
The problem we have is that both in the original language portion
(form) and in the the target language explanation (sense) we need to
allow many optional elements such as example, pronunciation, often
multiple times (as there can be many forms or senses or many examples
for each sense or form), gradually this led us to a very complex and
loose model which also doesn't maintain the relationship between the
original and translation too well.


I was wondering if any of you have any experience dealing with similar
content and whether you could share your experience and schemas?

We are working a lot with XML-based bilingual dictionaries (not phrasebooks, although they may be similar). I think the bottom line is, don'tuse DocBook for dictionaries (at least not for the body of thedictionary, i.e. all the entries). It just isn't the same kind ofstructure.

TEI-encoded dictionaries tend to reflect the structure of the printdictionary from which the electronic form was derived. That has acouple advantages:1) It's easy(-er) to convert from the print form to the electronic form,and go back later and make sure you did it right2) It makes producing a new print copy of the dictionary that looks likethe original print dictionary easy(-er).


It also has some disadvantages:

1) Unless you're working with a bunch of similar dictionaries from asingle publisher, you're likely to wind up with a large number ofschemas (or DTDs), one for each dictionary, and that can be hard to manage.2) The large number of schemas in (1) also means that you probably haveto write a different CSS (or whatever you use) for each one.3) You're limited to a single presentation form, i.e. it is difficult todisplay a root-based dictionary as a stem-based dictionary.

What we (and probably most people who work with multiple electronicdictionaries) do instead, is to use a generic lexicon schema. Thisflattens the overall structure of a typical print dictionary (e.g.subentries become entries on their own); the original structure isinstead represented by xrefs (so a sub-entry and a minor entry both havepointers back to the main entry). One can then postpone until run-timedecisions like root-based vs. stem-based presentation, or whether agiven minor entry is displayed as a sub-entry or as an entry on its own(and perhaps alphabetized on its own, if that's relevant to theelectronic display). The run-time decisions are then implemented usingone of two (or several) style sheets.

More than that about this approach (as opposed to doing something withdictionaries inside DocBook) probably doesn't belong on this list.Fortunately there are lexicography mailing lists, e.g. the Lexicographylist (see http://linguistlist.org/lists/get-lists.cfm).

--
   Mike Maxwell
   What good is a universe without somebody around to look at it?
   --Robert Dicke, Princeton physicist

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [docbook-apps] Capturing phrase books and dictionaries

Reply via email to