PDUTR #25: Unicode Support for Mathematics

Markus Kuhn Thu, 03 Jan 2002 09:10:18 -0800

Dear Unicode Maths team,

I've read with enthusiasm your draft document


  http://www.unicode.org/unicode/reports/tr25/

and have great hopes that this project for "Unicode Plain Text Encoding
of Mathematics" will progress well and be widely implemented once it is
finished!

I thought (from comp.text.sgml discussions in the early 1990s) that it
was in general widely accepted that SGML is in practice far too
inconvenient for entering mathematical text and that and math DTD will
not lead naturally to intuitive and consistent keyborad entry
techniques, which is why I always considered MathML more an academic
exercise than anything that I would ever really want to use to get work
done. MathML has never been anywhere near being a potential competitor
for TeX.

I therefore observe with great interest that Unicode plans to treat
mathematics as just yet another complex script (like Indic, etc.), in a
way such that finally authors of SGML/XML document type definitions and
style sheets will not have to make much further provisions for support
of mathematics than for example define a single element for marking a
displayed equation. Also the prospect of being able to search for
mathematical formula fragments with web search engines is exciting.

A few comments on the current draft:

  - It is not yet clear, how white-space is to be handled. In TeX,
    the math mode has a lot of heuristics for adding white space where
    mathematical typographic tradition finds it convenient, for example
    around every operator. It has often been observed that scientific papers
    written in Word have often far inferiour mathematical spacing than
    papers written in TeX, because TeX's heuristic algorithms are
    far better than an inexperienced author. However, these heuristics
    fail frequencly, and more often then desireable, TeX users have to
    manually override the math spacing with \, and the like.

    Your current text does not yet make it clear, whether the additional
    white space used around mathematical operators will be added by the
    rendering engine and font (as in TeX) or will be encoded in the plain
    text. I suspect encoding the whitespace in the plaintext is ultimately
    preferable, as it will ensure more control in a portable way, even
    though that means that typographic beginners will be more likely
    to produce ugly formulas. Heuristc's like TeX's would have to become
    part of the keyboard entry and style checking mechanisms of the
    editor (like the Word spell checker), not of the rendering engine.
    This should make results hopefully more predictable across a wide
    range of rendering engines.

  - On section 5.1 "Recognizing Mathematical Expressions":
    With intra-formula white-space being encoded in the plain text, and
    variables typically being written in the Plane 1 math characters, there
    should never be a need to explicitly delimit mathematical formulas
    from "normal text", as for the rendering engine, they would just be
    normal text. In other words, it would be desireable if your proposal
    wouldn't make having section 5.1 necessary.

  - What is missing at the moment are a mechanism for handling matrices
    commutative diagrams and similar tabular arrangements of inline
    formulas. Most markup languages and rendering engines have already
    very sophisticated mechanisms for the layout of tables. I think,
    the best appraoch would be to simply use or slightly extend the
    already available table mechanism to encode matrices. All that Unicode
    has to add is a combining modifier corresponding to TeX's \left and
    \right command that instructs a delimiter glyph to grow with the
    height of the text in between, which could include an inline table with
    centered alignment. Don't dublicate what the existing table engines
    already provide. In that light, I would reconsider the need for the
    briefly mentioned align-over operator.

    Using the table mechanism of the higher markup language has numerous
    advantages:

      - the DTD keeps control over where matrices are allowed (e.g., only in
        displayed equations, but not inline and not in headings or
        keyword lists)

      - layout and cut&paste selection in tables is a very complex process,
        you really don't want to have to implement that twice

    It is true that plaintext Unicode matrixes would simplify the
    cut&paste of matrices as well, but that is probably not worth the cost
    of blurring the currently quite clear interface between a paragraph
    redering engine and a page/table layout engine. Dramatically simplified
    versions of MathML on to of plaintext Unicode math can still be used
    to encode matrices in a portable and reusable way.

  - A stylistic comment: I think it would suit the text better not
    too spend such a lot of time with critizising TeX and MathML.
    Knowledgeable readers will be well familiar with TeX and will
    discover for themselves the advantages of your approach over
    existing practice, and the inadequacies of MathML are obvious to
    anyone who had even a brief look at the entire idea of encoding
    formulas in XML.

The proposal is certainly still in an early stage, but it is heading in
the right direction and I will follow its progress with great interest!

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

PDUTR #25: Unicode Support for Mathematics

Reply via email to