Michael Sobolev <[EMAIL PROTECTED]> writes: > I've got a small question: where all these entities come from? :)
W3O, mostly. See the copyright file in the sgml-data package. > These make me think that it does not matter whether //HTML suffix is there the > entities are the same. [...] > Aha, at least, this makes me think that these two files are different! They > are defining different sets of entities. BUT, according to > /usr/lib/sgml/catalog file, the first set of entities can be also referred to > as to "...//EN". > So here is my question, how I should treat all this ifnormation? With caution. It is possible that I have screwed up and marked as non-HTML specific what really *is* HTML specific. Note that the docbook-xml package contains XML versions of this stuff (XML encodes entities a little differently .. I think it's implicitly CDATA). > My main concern (well, it's where this investigatation started from) is entity > named copy. If I look into first file I see > > <!ENTITY copy CDATA "©"> This is a Unicode character definition. > I see no definition for copy in the second file, while iso-.../ISOnum file > defines: > > <!ENTITY copy SDATA "[copy ]"--=copyright sign--> >From <URL:http://www.oasis-open.org/cover/isoEntsExplained.html>, | They are "SDATA" entity sets, which means that it is the job of the | recipient to map them to something locally useful. > These are different definitions and while in the second case I could process > this SDATA [copy ] for producing © in HTML output and \copyright in TeX > output, I lack this possibility in first case. Why do you say that? As far as I am aware there are TeX packages that can handle Unicode. > Please comment. Well, basically, the SDATA mappings are entirely arbitrary. Therefore, for the standard entity-sets which I have shipped with the sgml-data package, I use the Unicode entity mappings, which is handled fine by advanced browsers and the SGML tool-chain (nsgmls, jade, etc). I definately am willing to ship an alternate SDATA style entity sets for SGML (XML requires the Unicode ones). I suppose either I could use a different FPI for that, or else I could even use SGML "marked sections" and a conditional parameter (i.e., use 'nsgmls -iuse-sdata-entities ...') to switch between whatever representation of entities you might want. In either case, the default, IMHO, should be the Unicode representation. I *guess* I prefer the former option (use alternate FPIs) becuase it seems like we could do it a bit at a time.... For more info read <URL:http://www.oasis-open.org/cover/topics.html#entities>. -- .....Adam Di [EMAIL PROTECTED]<URL:http://www.onShore.com/>

