Re: [whatwg] Hyphenation
On 11 Jan 2007, at 1:49PM, Håkon Wium Lie wrote: Prince doesn't support exception dictionaries. Is it not possible to encode exceptions in the hyphenation dictionary? Yes, that should be possible, actually. The encoding of certain words in a default exception dictionary seems to be a design choice in TeX rather than a requirement. (By the way, the term `dictionary' used to designate a set of hyphenation patterns that are not, in general, words, is quite confusing.) DSSSL has an 'hyphenation-exceptions' property which takes a list of strings. I'm unsure if it has been implemented, though. Interesting. This would be useful for authors who wanted to indicate a few exceptions without specifying a complete set of hyphenation patterns. (TeX includes 4,447 patterns, and two or several sets cannot easily be merged.) [In TeX], hyphenation can [also] be indicated locally. This is needed in order to hyphenate words like rec-ord/re-cord and is the only level that deals with spelling changes. This can be done by supplying your own dictionary through the 'hyphenate-dictionary' property. You seem to have misinterpreted the intended meaning of `locally'. The two problems are as follows: 1) Given the following sentence: `Don't wait for record companies, record records yourselves.' In order to hyphenate this correctly, explicit hyphenation points (\- in TeX) must be inserted locally, i.e., as part of the words, as follows: `Don't wait for rec\-ord companies, re\-cord rec\-ords yourselves.' 2) TeX's hyphenation patterns cannot encode spelling changes; neither can its exception dictionary. Therefore, spelling changes like backen - bak-ken must be indicated explicitly each time the word occurs. There are a few additional caveats. For instance, it is not entirely obvious what should be considered to be a `word' or which characters should be allowed in a `word' [... lots of less important points ...] How does Prince deal with these issues? Prince6 does't try to go beyond Tex. Fair enough. I realise that my question ended up rather too far away from the most important issue. I suppose Prince relies on Unicode character classes to identify letters (which is better than Plain TeX's default [unaccented English letters only], but less flexible) and uses a special rule to treat hyphens. Is this a correct assumption? Can I find more information on such details somewhere? -- Øistein E. Andersen
Re: [whatwg] contenteditable, em and strong
On Jan 11, 2007, at 10:42, fantasai wrote: Are you arguing that i should mean emphasis instead of italics? If so, I disagree... Almost, except s/emphasis/different from normal paragraph content/ to dodge the discussion on what constitutes emphasis. I am arguing that The introduction of em and strong (circa 1993) has failed to achieve a semantic improvement over i and b, because prominent tools such as Dreamweaver, Tidy, IE and Opera as well as simplified well-intentioned advocacy treat em and strong merely as more fashionable alternatives to i and b. (I mean failure in terms of what meaning a markup consumer can extract from the real Web without a private agreement with the producer of a given Web page. I don't mean the ability of authors to write style sheets for their own markup.) Therefore, in retrospect, it might have been more useful to generalize i and b back in 1993 instead of trying to launch alternatives. i could have been generalized as follows: i denotes content that is different from normal paragraph content. For scripts that customarily use italics for this purpose, the default presentation on the visual media is italics when the ability to render text in italics is available. User agents may use different default presentations for making the content different from normal paragraph content for scripts that don't customarily use italics, on non-visual media or when italics are not available for display. For example, for Chinese and Japanese accent-like glyphs above or below the content could be used, for aural media a different tone of voice could be used and for tty display inverted colors could be used. But that wasn't done back in 1993 and now were are stuck with two pairs of elements. I suggest defining the pairs as synonymous (giving in to practice made prevalent by tools biased towards bicameral scripts) and then generalizing them as outlined above. Nowadays with CSS, refining the default presentation is relatively easy when the default isn't exactly right. For private styling conventions, hand- coding authors would have double the style hooks without having to use class. (Specifically, I am not suggesting deprecating or obsoleting any of i, b, em and strong.) Insisting on the difference of i and em is not without harm, because arguing about which one to use is not without opportunity cost. Also, I think the expected payoff (that mpt gave) from careful differentiation between the elements is not worth the trouble even if it was achievable through an education campaign. P.S. To see how far we have come since 1993, check out this example in the IIIR draft: This text contains an ememphasized/em word. strongDon't assume/strong that it will be italic! It was made using the CODEEM/CODE element. A citation is typically italic and has no formal necessary structure: citeMoby Dick/cite is a book title. http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Hyphenation
Also sprach Øistein E. Andersen: (By the way, the term `dictionary' used to designate a set of hyphenation patterns that are not, in general, words, is quite confusing.) The term hypenation dictionary is quite common, but I see your point. What would be a better name for the property? hyphenation-pattern hypenation-list hypenation-resource or, perhaps: hyphenationshy;pattern :-) [In TeX], hyphenation can [also] be indicated locally. This is needed in order to hyphenate words like rec-ord/re-cord and is the only level that deals with spelling changes. This can be done by supplying your own dictionary through the 'hyphenate-dictionary' property. You seem to have misinterpreted the intended meaning of `locally'. The two problems are as follows: 1) Given the following sentence: `Don't wait for record companies, record records yourselves.' In order to hyphenate this correctly, explicit hyphenation points (\- in TeX) must be inserted locally, i.e., as part of the words, as follows: `Don't wait for rec\-ord companies, re\-cord rec\-ords yourselves.' shy; is probably the best way to encode this. However, it can be done through CSS as well: Dont's wait for span style=hypenation-dictionary: rec-ord.dicrecord/span companies, span style=hypenation-dictionary: re-cord.dicrecord/span yourself. -hkon Håkon Wium Lie CTO °þe®ª [EMAIL PROTECTED] http://people.opera.com/howcome
Re: [whatwg] Hyphenation
On 11 Jan 2007, at 5:33PM, Håkon Wium Lie wrote: The term hypenation dictionary is quite common, but I see your point. What would be a better name for the property? hyphenation-pattern hypenation-list hypenation-resource Liang's paper `Word Hy-phen-a-tion by Com-put-er', in which the concept was first introduced, used the term `hyphenation patterns'. Unsurprisingly, Liang's supervisor, Knuth, used the same term in the TeXbook, and this expression seems to have become the generally accepted one amongst TeX users. `Hyphenation dictionary' is also common, but this tends to mean something slightly different. To exemplify, the first five lines of what I would call a hyphenation dictionary looks like this: a cap·pel·la a for·ti·o·ri a go·go a pos·te·ri·o·ri a pri·o·ri [Interestingly, this particular dictionary contains multi-word expression, but most hyphenation engines, as well as spelling checkers, cannot take advantage of these, as each word (according to some definition) is typically treated in isolation.] In contrast, the first five hyphenation patterns in TeX82 are the following: .ach4 .ad4der .af1t .al3t .am5at It think it is useful to keep the distinction and would suggest to rename the property in question `hyphenation-patterns'. (TeX's exception dictionary falls within this narrower definition of a hyphenation dictionary.) http://computing-dictionary.thefreedictionary.com/hyphenation says: HYPHENATION: Breaking words that extend beyond the right margin. Software hyphenates words by matching them against a hyphenation dictionary or by using a built-in set of rules, or both. http://www.answers.com/topic/hyphenation-dictionary is more specific: HYPHENATION DICTIONARY: A word file with predefined hyphen locations. http://www.computeruser.com/resources/dictionary/definition.html?lookup=2188 gives a more generic definition: A file, usually in a word processing or desktop publishing program, which defines where hyphens will be placed for common words. Google returns about 21,200 results for /hyphenation dictionar(y|ies)/ and 148,100 for /hyphenation patterns?/, so the latter should also be fairly common. To me, a `hyphenation list' suggests something rather like a hyphenation dictionary, whereas `hyphenation resource' probably should be reserved for a more comprehensive source of hyphenation information — unless the same property is supposed to be able to refer to different kinds of hyphenation data. [In TeX], hyphenation can [also] be indicated locally. This is needed in order to hyphenate words like rec-ord/re-cord and is the only level that deals with spelling changes. shy; is probably the best way to encode this. However, it can be done through CSS as well: Dont's wait for span style=hypenation-dictionary: rec-ord.dicrecord /span companies, span style=hypenation-dictionary: re-cord.dic record/span yourself. Right, I did not get your point at first. This does indeed cover the first reason to use explicit mark-up in TeX. Concerning spelling changes, Petr Sojka's `Notes on Compound Word Hyphenation in TeX' [1], section 3.2, describes how a minimally extended version of the TeX algorithm can deal with irregular hyphenation without any extraneous mark-up, i.e., without any unnecessary burden on the author. Perhaps an idea for Prince7? Anyway, the preliminary conclusion seems to be that a hyph element in HTML is unnecessary, so this discussion should probably continue somewhere else. [1] http://www.fi.muni.cz/usr/sojka/papers/tug95.pdf -- Øistein E. Andersen
Re: [whatwg] contenteditable, em and strong
On Jan 12, 2007, at 5:23 AM, Henri Sivonen wrote: ... The introduction of em and strong (circa 1993) has failed to achieve a semantic improvement over i and b, because prominent tools such as Dreamweaver, Tidy, IE and Opera as well as simplified well-intentioned advocacy treat em and strong merely as more fashionable alternatives to i and b. (I mean failure in terms of what meaning a markup consumer can extract from the real Web without a private agreement with the producer of a given Web page. I don't mean the ability of authors to write style sheets for their own markup.) ... Is the effort to get people to use CSS instead of spacer GIFs a bad idea? Is the effort to get people to use h1..h6 instead of pb or pfont a bad idea? Is the effort to get people to use CSS instead of table for layout a bad idea? There were, I'm sure, many more occurrences of those problems than there were improper uses of em and strong. And the efforts to replace them are much older than the effort to get people who don't think about semantics to use b and i, which has hardly even started yet. Ten years ago, the typical Web developer probably didn't know what em and strong were. Now, the typical Web developer probably thinks that b and i are dirty and that XHTML is the future. This does not mean all is lost, it just means the standards advocates oversteered. Time for another adjustment. ... Insisting on the difference of i and em is not without harm, because arguing about which one to use is not without opportunity cost. ... Not without makes that statement look more profound than it is. -- Matthew Paul Thomas http://mpt.net.nz/