Re: [whatwg] Hyphenation

2007-01-11 Thread Øistein E . Andersen
On 11 Jan 2007, at 1:49PM, Håkon Wium Lie wrote:

 Prince doesn't support exception dictionaries. Is it not
 possible to encode exceptions in the hyphenation dictionary?

Yes, that should be possible, actually. The encoding of certain
words in a default exception dictionary seems to be a design
choice in TeX rather than a requirement. (By the way, the term
`dictionary' used to designate a set of hyphenation patterns that
are not, in general, words, is quite confusing.)

 DSSSL has an 'hyphenation-exceptions' property which takes a
 list of strings. I'm unsure if it has been implemented, though.

Interesting. This would be useful for authors who wanted to
indicate a few exceptions without specifying a complete set of
hyphenation patterns. (TeX includes 4,447 patterns, and two or
several sets cannot easily be merged.)

 [In TeX], hyphenation can [also] be indicated locally.
 This is needed in order to hyphenate words like
 rec-ord/re-cord and is the only level that deals with
 spelling changes.

 This can be done by supplying your own dictionary through the
 'hyphenate-dictionary' property.

You seem to have misinterpreted the intended meaning of
`locally'. The two problems are as follows:

1) Given the following sentence: `Don't wait for record companies,
record records yourselves.' In order to hyphenate
this correctly, explicit hyphenation points (\- in TeX) must
be inserted locally, i.e., as part of the words, as follows:
`Don't wait for rec\-ord companies, re\-cord rec\-ords yourselves.'

2) TeX's hyphenation patterns cannot encode spelling changes;
neither can its exception dictionary.
Therefore, spelling changes like backen - bak-ken must be
indicated explicitly each time the word occurs.

 There are a few additional caveats. For instance, it is not entirely  
 obvious what should be considered to be a `word' or which characters  
 should be allowed in a `word' 
 [... lots of less important points ...]
 How does Prince deal with these issues?

 Prince6 does't try to go beyond Tex.

Fair enough. I realise that my question ended up rather too far away from the 
most important issue. I suppose Prince relies on Unicode character classes to 
identify letters (which is better than Plain TeX's default [unaccented English 
letters only], but less flexible) and uses a special rule to treat hyphens. Is 
this a correct assumption? Can I find more information on such details 
somewhere?

-- 
Øistein E. Andersen


Re: [whatwg] contenteditable, em and strong

2007-01-11 Thread Henri Sivonen

On Jan 11, 2007, at 10:42, fantasai wrote:


Are you arguing that i should mean emphasis instead of italics?
If so, I disagree...


Almost, except s/emphasis/different from normal paragraph content/ to  
dodge the discussion on what constitutes emphasis.


I am arguing that

The introduction of em and strong (circa 1993) has failed to  
achieve a semantic improvement over i and b, because prominent  
tools such as Dreamweaver, Tidy, IE and Opera as well as simplified  
well-intentioned advocacy treat em and strong merely as more  
fashionable alternatives to i and b. (I mean failure in terms of  
what meaning a markup consumer can extract from the real Web without  
a private agreement with the producer of a given Web page. I don't  
mean the ability of authors to write style sheets for their own markup.)


Therefore, in retrospect, it might have been more useful to  
generalize i and b back in 1993 instead of trying to launch  
alternatives. i could have been generalized as follows: i  
denotes content that is different from normal paragraph content. For  
scripts that customarily use italics for this purpose, the default  
presentation on the visual media is italics when the ability to  
render text in italics is available. User agents may use different  
default presentations for making the content different from normal  
paragraph content for scripts that don't customarily use italics, on  
non-visual media or when italics are not available for display. For  
example, for Chinese and Japanese accent-like glyphs above or below  
the content could be used, for aural media a different tone of voice  
could be used and for tty display inverted colors could be used.


But that wasn't done back in 1993 and now were are stuck with two  
pairs of elements. I suggest defining the pairs as synonymous (giving  
in to practice made prevalent by tools biased towards bicameral  
scripts) and then generalizing them as outlined above. Nowadays with  
CSS, refining the default presentation is relatively easy when the  
default isn't exactly right. For private styling conventions, hand- 
coding authors would have double the style hooks without having to  
use class. (Specifically, I am not suggesting deprecating or  
obsoleting any of i, b, em and strong.)


Insisting on the difference of i and em is not without harm,  
because arguing about which one to use is not without opportunity  
cost. Also, I think the expected payoff (that mpt gave) from careful  
differentiation between the elements is not worth the trouble even if  
it was achievable through an education campaign.



P.S. To see how far we have come since 1993, check out this example  
in the IIIR draft:


This text contains an ememphasized/em word.
strongDon't assume/strong that it will be italic!
It was made using the CODEEM/CODE element. A citation is
typically italic and has no formal necessary structure:
citeMoby Dick/cite is a book title.

http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Hyphenation

2007-01-11 Thread Håkon Wium Lie
Also sprach Øistein E. Andersen:

  (By the way, the term
  `dictionary' used to designate a set of hyphenation patterns that
  are not, in general, words, is quite confusing.)

The term hypenation dictionary is quite common, but I see your
point. What would be a better name for the property?

  hyphenation-pattern
  hypenation-list
  hypenation-resource

or, perhaps:

  hyphenationshy;pattern

:-)

   [In TeX], hyphenation can [also] be indicated locally.
   This is needed in order to hyphenate words like
   rec-ord/re-cord and is the only level that deals with
   spelling changes.
  
   This can be done by supplying your own dictionary through the
   'hyphenate-dictionary' property.
  
  You seem to have misinterpreted the intended meaning of
  `locally'. The two problems are as follows:
  
  1) Given the following sentence: `Don't wait for record companies,
  record records yourselves.' In order to hyphenate
  this correctly, explicit hyphenation points (\- in TeX) must
  be inserted locally, i.e., as part of the words, as follows:
  `Don't wait for rec\-ord companies, re\-cord rec\-ords yourselves.'

shy; is probably the best way to encode this. However, it can be done
through CSS as well:

  Dont's wait for span style=hypenation-dictionary: 
rec-ord.dicrecord/span 
  companies, span style=hypenation-dictionary: re-cord.dicrecord/span 
yourself.

-hkon
  Håkon Wium Lie  CTO °þe®ª
[EMAIL PROTECTED]  http://people.opera.com/howcome



Re: [whatwg] Hyphenation

2007-01-11 Thread Øistein E . Andersen
On 11 Jan 2007, at 5:33PM, Håkon Wium Lie wrote:

 The term hypenation dictionary is quite common, but I see your
 point. What would be a better name for the property?

  hyphenation-pattern
  hypenation-list
  hypenation-resource

Liang's paper `Word Hy-phen-a-tion by Com-put-er', in which the concept
was first introduced, used the term `hyphenation patterns'. Unsurprisingly,
Liang's supervisor, Knuth, used the same term in the TeXbook, and this
expression seems to have become the generally accepted one amongst TeX users.

`Hyphenation dictionary' is also common, but this tends to mean something
slightly different. To exemplify, the first five lines of what I would call a
hyphenation dictionary looks like this:
 a cap·pel·la
 a for·ti·o·ri
 a go·go
 a pos·te·ri·o·ri
 a pri·o·ri

[Interestingly, this particular dictionary contains multi-word expression, but
most hyphenation engines, as well as spelling checkers, cannot take advantage of
these, as each word (according to some definition) is typically treated in 
isolation.]

In contrast, the first five hyphenation patterns in TeX82 are the following:
 .ach4
 .ad4der
 .af1t
 .al3t
 .am5at

It think it is useful to keep the distinction and would suggest to rename the
property in question `hyphenation-patterns'. (TeX's exception dictionary
falls within this narrower definition of a hyphenation dictionary.)

http://computing-dictionary.thefreedictionary.com/hyphenation says:
 HYPHENATION: Breaking words that extend beyond the right margin.
 Software hyphenates words by matching them against a hyphenation
 dictionary or by using a built-in set of rules, or both.

http://www.answers.com/topic/hyphenation-dictionary is more specific:
 HYPHENATION DICTIONARY: A word file with predefined hyphen locations.

http://www.computeruser.com/resources/dictionary/definition.html?lookup=2188
gives a more generic definition:
 A file, usually in a word processing or desktop publishing program,
 which defines where hyphens will be placed for common words.

Google returns about 21,200 results for /hyphenation dictionar(y|ies)/ and
148,100 for /hyphenation patterns?/, so the latter should also be fairly common.

To me, a `hyphenation list' suggests something rather like a hyphenation
dictionary, whereas `hyphenation resource' probably should be reserved
for a more comprehensive source of hyphenation information — unless
the same property is supposed to be able to refer to different kinds
of hyphenation data.


 [In TeX], hyphenation can [also] be indicated locally.
 This is needed in order to hyphenate words like
 rec-ord/re-cord and is the only level that deals with
 spelling changes.

 shy; is probably the best way to encode this. However, it can be done
through CSS as well:

Dont's wait for span style=hypenation-dictionary: rec-ord.dicrecord
/span companies, span style=hypenation-dictionary: re-cord.dic
record/span yourself.

Right, I did not get your point at first. This does indeed cover the first 
reason
to use explicit mark-up in TeX.

Concerning spelling changes, Petr Sojka's `Notes on Compound Word
Hyphenation in TeX' [1], section 3.2, describes how a minimally extended
version of the TeX algorithm can deal with irregular hyphenation without any
extraneous mark-up, i.e., without any unnecessary burden on the author.
Perhaps an idea for Prince7?

Anyway, the preliminary conclusion seems to be that a hyph element in HTML
is unnecessary, so this discussion should probably continue somewhere else.

[1] http://www.fi.muni.cz/usr/sojka/papers/tug95.pdf

-- 
Øistein E. Andersen


Re: [whatwg] contenteditable, em and strong

2007-01-11 Thread Matthew Paul Thomas

On Jan 12, 2007, at 5:23 AM, Henri Sivonen wrote:

...
The introduction of em and strong (circa 1993) has failed to 
achieve a semantic improvement over i and b, because prominent 
tools such as Dreamweaver, Tidy, IE and Opera as well as simplified 
well-intentioned advocacy treat em and strong merely as more 
fashionable alternatives to i and b. (I mean failure in terms of 
what meaning a markup consumer can extract from the real Web without a 
private agreement with the producer of a given Web page. I don't mean 
the ability of authors to write style sheets for their own markup.)

...


Is the effort to get people to use CSS instead of spacer GIFs a bad 
idea?


Is the effort to get people to use h1..h6 instead of pb or 
pfont a bad idea?


Is the effort to get people to use CSS instead of table for layout a 
bad idea?


There were, I'm sure, many more occurrences of those problems than 
there were improper uses of em and strong. And the efforts to 
replace them are much older than the effort to get people who don't 
think about semantics to use b and i, which has hardly even started 
yet.


Ten years ago, the typical Web developer probably didn't know what em 
and strong were. Now, the typical Web developer probably thinks that 
b and i are dirty and that XHTML is the future. This does not mean 
all is lost, it just means the standards advocates oversteered. Time 
for another adjustment.



...
Insisting on the difference of i and em is not without harm, 
because arguing about which one to use is not without opportunity 
cost.

...


Not without makes that statement look more profound than it is.

--
Matthew Paul Thomas
http://mpt.net.nz/