Re: [XeTeX] nbsp; in XeTeX

2011-11-15 Thread Petr Tomasek
On Mon, Nov 14, 2011 at 02:27:03AM -0800, Chris Travers wrote:
 On Mon, Nov 14, 2011 at 2:24 AM, Petr Tomasek toma...@etf.cuni.cz wrote:
 
  Using different color.
 
 Do we really want to tie XeTeX users to a small number of editors?
 
 Chris Travers

Do we really make XeTeX incompatible with the rest of the (unicode)
world?

P.T.

-- 
Petr Tomasek http://www.etf.cuni.cz/~tomasek
Jabber: but...@jabbim.cz


EA 355:001  DU DU DU DU
EA 355:002  TU TU TU TU
EA 355:003  NU NU NU NU NU NU NU
EA 355:004  NA NA NA NA NA





--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Keith J. Schultz
Hi Tobias,

Am 14.11.2011 um 18:42 schrieb Tobias Schoel:

 
 
 Am 14.11.2011 18:30, schrieb msk...@ansuz.sooke.bc.ca:
[snip, snip]
 Now we come to the trouble of Unicode specifying a line-breaking algorithm ( 
 http://www.unicode.org/reports/tr14/tr14-26.html ), which probably isn't 
 exactly TeX's. I'm not into these algorithms, so I can't compare. But I would 
 ask some Master of this Art to speak up about this conflict.
I went and briefly look at the annex. In the beginning it states that 
the annexes are not necessarily a requirement unless mentioned in the standard!
I did not check the standard, but as you read on the description of the 
LBA is not mandatory at all. 
Furthermore, it more or less describes which characters are directly 
involved with line breaking (top of table 1).
The rest is just a suggest how one Might go about achieving line 
breaking. This is not a standard at all.  

Since TeX has its own line breaking algorithms we need not be 
interrested with the content of this annex as far as Unicode is concerned.
What you should be aware of is that the LBA is intended as an aide for 
a preprocessor to a more elaborate line breaking algorithm.
It has been approved for printing, but no where does it state that it 
must be followed nor that it is complete. 
In other words it is merely a suggestion.

There is no conflict per se. Just another way of dealing with line 
breaking. There is no real standard for line breaking.
It is more or less a matter of taste, style and aesthetics. (Yes, there 
are many conventions that should be observed,
and many are grammatical in nature).

regards
Keith.





--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Chris Travers
On Tue, Nov 15, 2011 at 2:27 AM, Keith J. Schultz keithjschu...@web.de wrote:
 Hi all,

 I agree that XeTeX should support all printable characters.

Given your definition I would say all visible printed characters.
Invisible characters are a problem in a programming language.

 A non.breaking space is to me a printable character, in so far that
 it is important and must be used to distinguish between word space, et all.

As long as this is an option which defaults to off, again I have no
problem with this.   I mean by this definition, carriage returns and
line feeds are also printable characters, and these are supported by
options which are turned on rather than on by default.

 To go back in history, one of my pet peeves in LaTeX was that I had to
 enter the German characters öäüß as \o, \a, etc and later the
 short cut forms s, u, etc. later with inputenc I finally, could just enter
 öäüß.But I had trouble, (actually just needed to convert) my files to and from
 apple to windows (so that editing was possible on windows).

 Yet, I still had trouble with quoting, so I was force to use \quote, et al.
 to have a simple method of quoting properly in english, german and french
 in one document! I even modified them to suite some requirements I need and
 I had one command.

 Unicode has thankfully change all this. I can forget about using all those TeX
 commands for the characters I need. I just type away.

 The only problem is now is the keyboard equivalents and how the editor of 
 choice
 displays them.

But here you have a problem.  An editor can display a non-breaking
space as its semantic value (i.e. with a special glyph, but this is
not without problems.  For example, we could also display line feeds
as the paragraph symbol but now that's also U+00B6, so now you have
ambiguity issues-- is it a unicode character or is it a line feed).
or you can color code, but this is problematic for a large number of
other reasons.

So I am not sure these are simple problems that admit of simple solutions.

My recommendation is:

1)  Default to handling all white space as it exists now.
2)  Provide some sort of switch, whether to the execution of XeTeX or
to the document itself, to turn on handling of special unicode
characters.
3)  If that switch is enabled, then treat the whitespaces according to
unicode meanings.  If not, treat them as standard whitespace.

The advantage of this approach is that people who don't want to worry
about what sort of whitespace is in text files they are inputting
don't have to worry about it, and that those who do have an easy way
of determining if a layout issue is caused by non-breaking spaces.

Best Wishes,
Chris Travers



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Philip and Le Khanh



Keith J. Schultz wrote:


A non.breaking space is to me a printable character, in so far that
it is important and must be used to distinguish between word space, et all.


If, for you, [a] non.breaking space is a printable character, then
presumably that character must be taken from some font.  If you take
a character from a font, it will have a size, and although it can be
combined with kerning rules to adjust its position w.r.t. adjacent
characters,  the logic for this is fairly restricted.  In particular,
it cannot take into account the amount by which TeX is seeking to
expand or contract spaces on the current line in order to achieve
optimal paragraphs.  So in your model of the ideal universe, 
non-breaking Unicode spaces would not behave as do conventional

TeX non-breaking spaces (which /do/ expand and contract to assist
in TeX's line-breaking), nor would they conform to their Unicode
definition where their decomposition is defined as :

noBreak SPACE (U+0020)

I wonder if you would like to discuss these points ?

Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] nbsp; in XeTeX

2011-11-15 Thread Zdenek Wagner
2011/11/14 Mike Maxwell maxw...@umiacs.umd.edu:
 On 11/14/2011 4:56 PM, Zdenek Wagner wrote:

 2011/11/14 Mike Maxwellmaxw...@umiacs.umd.edu:

 We are not (at least I am not) suggesting that everyone must use
 the Unicode non-breaking space character, or etc.  What we *are*
 suggesting is that in Xe(La)Tex, we be *allowed* to use those
 characters, and that they have their

 You are allowed to use them, nothing prevents you.

 At least one participant in this thread (or actually the related thread
 Whitespace in input--the person in question is msk...@ansuz.sooke.bc.ca)
 has said:
 U+00A0 is an invalid character for TeX input

 That sounds pretty much like prevention (although maybe you don't agree with
 him).

I strongly disagree. From the TeX point of view a character is invalid
if its \catcode is equal to 15 which is not the case of U+00a0. If an
invalid character is found on input, an error message appears in the
log. It does not happen with U+00a0 because its \catcode is 12 which
means other character. When talking about \catcode I ave in mind a
value defined in the format. Even if a character is declared as
invalid in the format, a user can assign another \catcode if the
character can be rendered.

 But in fact, the last time I tried this, the NBSP character was interpreted
 in the same way as an ASCII space, which is not what I want.  What I want
 (repeating myself again) is for such characters to--

NBSP's \catcode is 12, so it is just a glyph in the font, it is not
treated specially by XeTeX. Line can be broken at glue if in does not
follow other discardable element, at penalty, at \discretionary but
not at a glyph, that's why this space is nonbreakable in the XeTeX's
eyes. Since it is a glyph, its width is fixed. You can do a few things
with it:

Change its \catcode to 10, then it will be normal
strethable/shrinkable space but will not be nonbreakable

Change its \catcode to 13 and define it as \nobreak\space. In such a
case it will have the same meaning as ~

 have their Unicode-defined semantics, to the extent that
 makes sense in XeTeX.
 --just the same as I would expect XeTeX (or xdvipdfmx) to correctly handle
 the visual re-ordering behavior of U+09C7 through U+09CC, or U+093F
 (Devanagari vowel sign I).

OpenOffice has some intelligence and recognizes the Devanagari script
automatically. This is not the case of XeTeX. When loading a
Devanagari font you have to switch the script to Devanagari too. Then
XeTeX properly handles U+093F and U+094D (other characters are handled
properly even without setting the script). Similarly you have to set
the Arabic script in order to connect the characters properly, without
setting the script only isolated forms will be typeset. Everything is
done in XeTeX, xdvipdfmx just renders properly reordered and composed
glyphs into PDF. The Velthuis Devanagari package contains even samples
for XeLaTeX, some support files have recently been moved to the
xetex-devanagari package.

 However, I would not like to think, why I have
 overful/underful boxes and opening hex editor to see what kind of
 space is written between words.

 A number of alternatives to a hex editor have been pointed out:
 1) color coding
 2) using a font that has a representation of these code points
 3) using any text editor that allows you to see the Unicode code point of a
 character (I use jEdit this way, I'm sure many other editors offer this
 support)

 Again, this is not about _forcing_ anyone to use NBSP etc., it is about
 _allowing_ their use *with the expected Unicode behavior.*
 --
        Mike Maxwell
        maxw...@umiacs.umd.edu
        My definition of an interesting universe is
        one that has the capacity to study itself.
        --Stephen Eastmond




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Mike Maxwell

On 11/15/2011 5:39 AM, Chris Travers wrote:

My recommendation is:

1)  Default to handling all white space as it exists now.
2)  Provide some sort of switch, whether to the execution of XeTeX or
to the document itself, to turn on handling of special unicode
characters.
3)  If that switch is enabled, then treat the whitespaces according to
unicode meanings.  If not, treat them as standard whitespace.


I think you asked me earlier whether that would satisfy me, and I failed 
to answer. Yes, it would.

--
Mike Maxwell
maxw...@umiacs.umd.edu
My definition of an interesting universe is
one that has the capacity to study itself.
--Stephen Eastmond


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Zdenek Wagner
2011/11/15 Mike Maxwell maxw...@umiacs.umd.edu:
 On 11/15/2011 5:39 AM, Chris Travers wrote:

 My recommendation is:

 1)  Default to handling all white space as it exists now.
 2)  Provide some sort of switch, whether to the execution of XeTeX or
 to the document itself, to turn on handling of special unicode
 characters.
 3)  If that switch is enabled, then treat the whitespaces according to
 unicode meanings.  If not, treat them as standard whitespace.

 I think you asked me earlier whether that would satisfy me, and I failed to
 answer. Yes, it would.

But such a solution is not clean, you cannot plug in such logic to the
TeX mouth when the input is being read nor to the output stage when
TECkit maps are in effect. I wrote the reasons earlier. The only
reasonable solution seems to be the one suggested by Phil Taylor, to
extend \catcode up to 255 and assign special categories to other types
of characters. Thus we could say that normal space id 10, nonbreakable
space is 16, thin space is 17 etc. XeTeX will then be able to treat
them properly.

 --
        Mike Maxwell
        maxw...@umiacs.umd.edu
        My definition of an interesting universe is
        one that has the capacity to study itself.
        --Stephen Eastmond


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Chris Travers
2011/11/15 Zdenek Wagner zdenek.wag...@gmail.com:
 2011/11/15 Mike Maxwell maxw...@umiacs.umd.edu:
 On 11/15/2011 5:39 AM, Chris Travers wrote:

 My recommendation is:

 1)  Default to handling all white space as it exists now.
 2)  Provide some sort of switch, whether to the execution of XeTeX or
 to the document itself, to turn on handling of special unicode
 characters.
 3)  If that switch is enabled, then treat the whitespaces according to
 unicode meanings.  If not, treat them as standard whitespace.

 I think you asked me earlier whether that would satisfy me, and I failed to
 answer. Yes, it would.

 But such a solution is not clean, you cannot plug in such logic to the
 TeX mouth when the input is being read nor to the output stage when
 TECkit maps are in effect. I wrote the reasons earlier. The only
 reasonable solution seems to be the one suggested by Phil Taylor, to
 extend \catcode up to 255 and assign special categories to other types
 of characters. Thus we could say that normal space id 10, nonbreakable
 space is 16, thin space is 17 etc. XeTeX will then be able to treat
 them properly.

But we are talking two different things here.  The first is user
interface, and the second is mechanism.

What I am saying is special handling of this sort should be required
to be enabled somehow by the user.  I don't really care how.  It could
be by a commandline switch to xelatex.  It could be by a call in the
document if that's possible.  It should be optional, and disabled by
default, given that the characters involved are not intended to be
displayed with glyphs.

Best Wishes,
Chris Travers



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Philip TAYLOR



Zdenek Wagner wrote:


The only  reasonable solution seems to be the one suggested by Phil Taylor, to
extend \catcode up to 255 and assign special categories to other types
of characters. Thus we could say that normal space id 10, nonbreakable
space is 16, thin space is 17 etc. XeTeX will then be able to treat
them properly.


which may, unfortunately, then require new types of node
in TeX's internal list structures ...

(may, not will).

** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Philip TAYLOR



Chris Travers wrote:


But we are talking two different things here.  The first is user
interface, and the second is mechanism.

What I am saying is special handling of this sort should be required
to be enabled somehow by the user.  I don't really care how.  It could
be by a commandline switch to xelatex.  It could be by a call in the
document if that's possible.  It should be optional, and disabled by
default, given that the characters involved are not intended to be
displayed with glyphs.


But /if/ it requires a change to the number of category codes
(and/or the creation of one or more classes of internal node),
then this is not something that should be capable of being
turned on or off within a document.  I don't have any problem
with the idea of turning the functionality on or off either
within a format file or from a command-line qualifier.

** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Zdenek Wagner
2011/11/15 Chris Travers chris.trav...@gmail.com:
 2011/11/15 Zdenek Wagner zdenek.wag...@gmail.com:
 2011/11/15 Mike Maxwell maxw...@umiacs.umd.edu:
 On 11/15/2011 5:39 AM, Chris Travers wrote:

 My recommendation is:

 1)  Default to handling all white space as it exists now.
 2)  Provide some sort of switch, whether to the execution of XeTeX or
 to the document itself, to turn on handling of special unicode
 characters.
 3)  If that switch is enabled, then treat the whitespaces according to
 unicode meanings.  If not, treat them as standard whitespace.

 I think you asked me earlier whether that would satisfy me, and I failed to
 answer. Yes, it would.

 But such a solution is not clean, you cannot plug in such logic to the
 TeX mouth when the input is being read nor to the output stage when
 TECkit maps are in effect. I wrote the reasons earlier. The only
 reasonable solution seems to be the one suggested by Phil Taylor, to
 extend \catcode up to 255 and assign special categories to other types
 of characters. Thus we could say that normal space id 10, nonbreakable
 space is 16, thin space is 17 etc. XeTeX will then be able to treat
 them properly.

 But we are talking two different things here.  The first is user
 interface, and the second is mechanism.

 What I am saying is special handling of this sort should be required
 to be enabled somehow by the user.  I don't really care how.  It could
 be by a commandline switch to xelatex.  It could be by a call in the
 document if that's possible.  It should be optional, and disabled by
 default, given that the characters involved are not intended to be
 displayed with glyphs.

The mechanism is simple, set this \catcode to 13 and define it as
\nobreak\space. If you wish to make it clever in all XeLaTeX corners,
find one of my previous posts to see what has to be taken into
account. It may be present in a package called nbsp.sty or so. No
change in XeTeX is needed if you do it this way.

 Best Wishes,
 Chris Travers



 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Zdenek Wagner
2011/11/15 Philip TAYLOR p.tay...@rhul.ac.uk:


 Zdenek Wagner wrote:

 The only  reasonable solution seems to be the one suggested by Phil
 Taylor, to
 extend \catcode up to 255 and assign special categories to other types
 of characters. Thus we could say that normal space id 10, nonbreakable
 space is 16, thin space is 17 etc. XeTeX will then be able to treat
 them properly.

 which may, unfortunately, then require new types of node
 in TeX's internal list structures ...

 (may, not will).

Sure, the change will not be trivial. I do not know how the category
codes are stored internally but extending them from 16 possible values
to 256 may require dramatic change in the internal structures.

 ** Phil.




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Zdenek Wagner
2011/11/15 Philip TAYLOR p.tay...@rhul.ac.uk:


 Chris Travers wrote:

 But we are talking two different things here.  The first is user
 interface, and the second is mechanism.

 What I am saying is special handling of this sort should be required
 to be enabled somehow by the user.  I don't really care how.  It could
 be by a commandline switch to xelatex.  It could be by a call in the
 document if that's possible.  It should be optional, and disabled by
 default, given that the characters involved are not intended to be
 displayed with glyphs.

 But /if/ it requires a change to the number of category codes
 (and/or the creation of one or more classes of internal node),
 then this is not something that should be capable of being
 turned on or off within a document.  I don't have any problem
 with the idea of turning the functionality on or off either
 within a format file or from a command-line qualifier.

If you know what such characters are (and it will certainly be
documented), you just set their categories back to 12 in order to get
the old behaviour.

 ** Phil.


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Philip TAYLOR



Zdenek Wagner wrote:


If you know what such characters are (and it will certainly be
documented), you just set their categories back to 12 in order to get
the old behaviour.


No ! A catcode is for life, not just for Christmas !  Once a
character has been read, and bound into a character/catcode pair,
that catcode remains immutable.  That means that code that is /not/
expecting to have to deal with non-standard catcodes could none the
less be passed token lists containing such entities if it is
possible, within a document, to turn such a feature on and
off again.

** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Arthur Reutenauer
On Tue, Nov 15, 2011 at 02:20:17PM +, Philip TAYLOR wrote:
 No ! A catcode is for life, not just for Christmas !  Once a
 character has been read, and bound into a character/catcode pair,
 that catcode remains immutable.

  Do you mean that as a general good practice in TeX programming, or as
a description of how TeX works?  The latter is obviously wrong.

Arthur


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Zdenek Wagner
2011/11/15 Philip TAYLOR p.tay...@rhul.ac.uk:


 Zdenek Wagner wrote:

 If you know what such characters are (and it will certainly be
 documented), you just set their categories back to 12 in order to get
 the old behaviour.

 No ! A catcode is for life, not just for Christmas !  Once a
 character has been read, and bound into a character/catcode pair,
 that catcode remains immutable.  That means that code that is /not/
 expecting to have to deal with non-standard catcodes could none the
 less be passed token lists containing such entities if it is
 possible, within a document, to turn such a feature on and
 off again.

Of course, I know it. What I meant was that you could set \catcode of
all these extended characters to 12 at the beginning of your
document. Thus you get the same behaviour as now.

 ** Phil.


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Philip TAYLOR



Arthur Reutenauer wrote:

On Tue, Nov 15, 2011 at 02:20:17PM +, Philip TAYLOR wrote:

No ! A catcode is for life, not just for Christmas !  Once a
character has been read, and bound into a character/catcode pair,
that catcode remains immutable.


   Do you mean that as a general good practice in TeX programming, or as
a description of how TeX works?  The latter is obviously wrong.


The latter is what the TeXbok says (P.~39) : Once a category code
has been attached to a character token, the attachment is permanent.

** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Herbert Schulz

On Nov 15, 2011, at 8:52 AM, Philip TAYLOR wrote:

 
 
 Arthur Reutenauer wrote:
 On Tue, Nov 15, 2011 at 02:20:17PM +, Philip TAYLOR wrote:
 No ! A catcode is for life, not just for Christmas !  Once a
 character has been read, and bound into a character/catcode pair,
 that catcode remains immutable.
 
   Do you mean that as a general good practice in TeX programming, or as
 a description of how TeX works?  The latter is obviously wrong.
 
 The latter is what the TeXbok says (P.~39) : Once a category code
 has been attached to a character token, the attachment is permanent.
 
 ** Phil.


Howdy,

What happens in a verbatim environment?

Good Luck,

Herb Schulz
(herbs at wideopenwest dot com)





--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Philip TAYLOR



Zdenek Wagner wrote:


Of course, I know it. What I meant was that you could set \catcode of
all these extended characters to 12 at the beginning of your
document. Thus you get the same behaviour as now.


Ah yes : with that, I have no problem.
** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Zdenek Wagner
2011/11/15 Herbert Schulz he...@wideopenwest.com:

 On Nov 15, 2011, at 8:52 AM, Philip TAYLOR wrote:



 Arthur Reutenauer wrote:
 On Tue, Nov 15, 2011 at 02:20:17PM +, Philip TAYLOR wrote:
 No ! A catcode is for life, not just for Christmas !  Once a
 character has been read, and bound into a character/catcode pair,
 that catcode remains immutable.

   Do you mean that as a general good practice in TeX programming, or as
 a description of how TeX works?  The latter is obviously wrong.

 The latter is what the TeXbok says (P.~39) : Once a category code
 has been attached to a character token, the attachment is permanent.

 ** Phil.


 Howdy,

 What happens in a verbatim environment?

It will have to be redefined, there will just be additional special
characters that will have to be handled. \XeTeXrevision will give you
information whether extended \catcode is implemented.

 Good Luck,

 Herb Schulz
 (herbs at wideopenwest dot com)





 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Arthur Reutenauer
 The latter is what the TeXbok says (P.~39) : Once a category code
 has been attached to a character token, the attachment is permanent.

  Yes, because you meant individual tokens (which I understood in
retrospect).  But in the context of the discussion, you really seemed to
be saying that you could not change the \catcode's of characters to be
read, which was the point (not that there is much point left to the
whole threads any more...)

Arthur


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Philip TAYLOR



Arthur Reutenauer wrote:

The latter is what the TeXbok says (P.~39) : Once a category code
has been attached to a character token, the attachment is permanent.


   Yes, because you meant individual tokens (which I understood in
retrospect).  But in the context of the discussion, you really seemed to
be saying that you could not change the \catcode's of characters to be
read, which was the point (not that there is much point left to the
whole threads any more...)


No no : changing catodes on the fly is standard TeX programming;
what we should not contemplate is changing the /number/ of catcodes
on the fly ...

** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Printing OTF glyph features

2011-11-15 Thread Adam Twardoch (List)
Stephan,

you can print font glyphs in XeTeX using \XeTeXglyph followed by the
glyph's decimal index.

You'd need to use a different tool to do the parsing of the OpenType
Layout tables, though. The Python package FontTools/TTX or FontForge
compiled as a Python module can be used to extract this information.
You'd need to do some coding though, going through the GSUB lookups and
compile a list of glyphs that are being output.

A.


On 11-11-15 06:46, Stephan wrote:
 Good day,

 I have been trying to print out the glyphs of a font (in my case Minion) that 
 are used in a stylistic variant. But I have not been able to do that...

 Is there a way of printing, let's say, all the glyphs that would be used if a 
 feature in a font is turned on ?

 For example, the k in this stylistic variant is different from the regular 
 k in the Minion font, however, I would like to know what other glyphs may 
 be 
 affected.

 Thanks,

 -Stephan


 --
 Subscriptions, Archive, and List information, etc.:
   http://tug.org/mailman/listinfo/xetex


-- 

May success attend your efforts,
-- Adam Twardoch
(Remove list. from e-mail address to contact me directly.)



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Philip TAYLOR



Herbert Schulz wrote:


The latter is what the TeXbok says (P.~39) : Once a category code
has been attached to a character token, the attachment is permanent.

** Phil.



What happens in a verbatim environment?


The verbatim environment sets up an environment within
which characters that have not yet been seen by TeX's
mouth receive category codes that potentially differ
from the category code that would normally be associated
with that character.  Once the category code has been
bound to a particular instance of that character, that
instance never changes its catcode.

** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Herbert Schulz

On Nov 15, 2011, at 11:19 AM, Philip TAYLOR wrote:

 
 
 Herbert Schulz wrote:
 
 The latter is what the TeXbok says (P.~39) : Once a category code
 has been attached to a character token, the attachment is permanent.
 
 ** Phil.
 
 What happens in a verbatim environment?
 
 The verbatim environment sets up an environment within
 which characters that have not yet been seen by TeX's
 mouth receive category codes that potentially differ
 from the category code that would normally be associated
 with that character.  Once the category code has been
 bound to a particular instance of that character, that
 instance never changes its catcode.
 
 ** Phil.


Howdy,

So what you are saying is not that you can't control the catcode of a 
particular character but that you can't change it after it is set and in TeX's 
``stomach.'' That I can agree with.

Good Luck,

Herb Schulz
(herbs at wideopenwest dot com)






--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


[XeTeX] XePersian in Persian version of Wikipedia

2011-11-15 Thread Vafa Khalighi
http://fa.wikipedia.org/wiki/%D8%B2%DB%8C%E2%80%8C%D9%BE%D8%B1%D8%B4%DB%8C%D9%86


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Herbert Schulz

On Nov 15, 2011, at 11:11 AM, Herbert Schulz wrote:

 
 On Nov 15, 2011, at 11:19 AM, Philip TAYLOR wrote:
 
 
 
 Herbert Schulz wrote:
 
 The latter is what the TeXbok says (P.~39) : Once a category code
 has been attached to a character token, the attachment is permanent.
 
 ** Phil.
 
 What happens in a verbatim environment?
 
 The verbatim environment sets up an environment within
 which characters that have not yet been seen by TeX's
 mouth receive category codes that potentially differ
 from the category code that would normally be associated
 with that character.  Once the category code has been
 bound to a particular instance of that character, that
 instance never changes its catcode.
 
 ** Phil.
 
 
 Howdy,
 
 So what you are saying is not that you can't control the catcode of a 
 particular character but that you can't change it after it is set and in 
 TeX's ``stomach.'' That I can agree with.
 
 Good Luck,
 
 Herb Schulz
 (herbs at wideopenwest dot com)


Howdy,

What I meant to say was...

So what you are saying is not that you can control the catcode of a particular 
character but that you can't change it after it is set and in TeX's 
``stomach.'' That I can agree with.

(notice the can't control --- can control)

Good Luck,

Herb Schulz
(herbs at wideopenwest dot com)






--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Printing OTF glyph features

2011-11-15 Thread Mojca Miklavec
On Tue, Nov 15, 2011 at 06:46, Stephan wrote:
 Good day,

 I have been trying to print out the glyphs of a font (in my case Minion) that
 are used in a stylistic variant. But I have not been able to do that...

 Is there a way of printing, let's say, all the glyphs that would be used if a
 feature in a font is turned on ?

 For example, the k in this stylistic variant is different from the regular
 k in the Minion font, however, I would like to know what other glyphs may be
 affected.

I think that Hans Hagen sent me an example for that, written in
ConTeXt MKIV (based on LuaTeX) for that. (I'm not sure if this was
included or not; there was definitely a document for showing
alternatives for OpenType Math, and there was definitely some document
showing different numbers with different features turned on.)

I need to remember where I have those documents, or you can try to ask
the same question on the ConTeXt mailing list (maybe Hans will find
that faster than me).

I assume that you want to inspect the font and that the exact engine
being used to get the job done doesn't matter so much to you?

Mojca


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XePersian in Persian version of Wikipedia

2011-11-15 Thread Zdenek Wagner
2011/11/15 Vafa Khalighi vafa...@gmail.com:
 http://fa.wikipedia.org/wiki/%D8%B2%DB%8C%E2%80%8C%D9%BE%D8%B1%D8%B4%DB%8C%D9%86

خوب


 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex





-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Philip TAYLOR

I think it made more sense with can't, Herb,
but that could be a trans-Atlantic difference
of usage -- you would, I think, say I could care
less where I would say I couldn't care less.

** Phil.

Herbert Schulz wrote:


What I meant to say was...

So what you are saying is not that you can control the catcode of a particular 
character but that you can't change it after it is set and in TeX's 
``stomach.'' That I can agree with.

(notice the can't control ---  can control)



--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Herbert Schulz

On Nov 15, 2011, at 2:43 PM, Ross Moore wrote:

 
 On 16/11/2011, at 5:56 AM, Herbert Schulz wrote:
 
 Given that TeX (and XeTeX too) deal wit a non-breakble space already (where 
 we usually use the ~ to represent that space) it seems to me that XeTeX 
 should treat that the same way.
 
 No, I disagree completely.
 
 What if you really want the Ux00A0 character to be in the PDF?
 That is, when you copy/paste from the PDF, you want that character
 to come along for the ride.
 
 In TeX ~ *simulates* a non-breaking space visually, but there is
 no actual character inserted.
 If you want the character you have to ensure that it gets there,
 and what more natural way is there than to put it in explicitly.
 
 This is how XeTeX treats it currently, according to my experiments,
 using just  fontspec  and  Charis SIL font.
 Anyone who has a different experience should check what other
 packages and fonts are being loaded, and whether there is something
 that specifically changes how that character is handled.
 

Howdy,

But isn't that also true about a regular space character? Doesn't (Xe)TeX 
insert some glue rather than a Space Character?

 The big puzzle will happen when someone, not using an editor capable of 
 displaying invisibles, can't understand why they can't get XeTeX to break 
 between the two words.
 
 That is an editor problem, not one that XeTeX itself should be
 concerned with.
 

Agreed. But I'll be you end up with lots of questions on ctt/texhax/etc. about 
line breaking; assuming that the non-breaking space actually does it's ``job.''

 
 Now having Ux00A0 between two words may change the way 
 hyphenation works for those words.
 
 But surely if you are wanting to inhibit a line-break
 between words, you probably also don't want either word to
 be hyphenated. So this could really be the correct thing.
 

or not. :-)

Good Luck,

Herb Schulz
(herbs at wideopenwest dot com)






--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Zdenek Wagner
2011/11/15 Ross Moore ross.mo...@mq.edu.au:

 On 16/11/2011, at 5:56 AM, Herbert Schulz wrote:

 Given that TeX (and XeTeX too) deal wit a non-breakble space already (where 
 we usually use the ~ to represent that space) it seems to me that XeTeX 
 should treat that the same way.

 No, I disagree completely.

 What if you really want the Ux00A0 character to be in the PDF?
 That is, when you copy/paste from the PDF, you want that character
 to come along for the ride.

From the typographical point of view it is the worst of all possible
methods. If you really wish it, then do not use TeX but M$ Word or
OpenOffice. M$ Word automatically inserts nonbreakable spaces at some
points in the text written in Czech. As far as grammer is concerned,
it is correct. However, U+00a0 is fixed width. If you look at the
output, the nonbreakable spaces are too wide on some lines and too
thin on other lines. I cannot imagine anything uglier.


-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Ross Moore
Hi Zdenek,

On 16/11/2011, at 8:58 AM, Zdenek Wagner wrote:

 2011/11/15 Ross Moore ross.mo...@mq.edu.au:
 
 On 16/11/2011, at 5:56 AM, Herbert Schulz wrote:
 
 Given that TeX (and XeTeX too) deal wit a non-breakble space already (where 
 we usually use the ~ to represent that space) it seems to me that XeTeX 
 should treat that the same way.
 
 No, I disagree completely.
 
 What if you really want the Ux00A0 character to be in the PDF?
 That is, when you copy/paste from the PDF, you want that character
 to come along for the ride.
 
 From the typographical point of view it is the worst of all possible
 methods. If you really wish it,

The *really wish it* is the choice of the author, not the
software.

 then do not use TeX but M$ Word or
 OpenOffice. M$ Word automatically inserts nonbreakable spaces at some
 points in the text written in Czech. As far as grammer is concerned,
 it is correct. However, U+00a0 is fixed width. If you look at the
 output, the nonbreakable spaces are too wide on some lines and too
 thin on other lines. I cannot imagine anything uglier.

I do not disagree with you that this could be ugly.
But that is not the point.

If you want superior aesthetic typesetting, with nice choices
for hyphenation, then don't use Ux00A0. Of course!


Whatever the reason for wanting to use this character, there
should be a straight-forward way to do it.
Using the character itself is:
 a.  the most understandable
 b.  currently works
 c.  requires no special explanation.


 
 
 -- 
 Zdeněk Wagner
 http://hroch486.icpf.cas.cz/wagner/
 http://icebearsoft.euweb.cz

Cheers,

Ross


Ross Moore   ross.mo...@mq.edu.au 
Mathematics Department   office: E7A-419  
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia  2109  fax: +61 (0)2 9850 8114







--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Zdenek Wagner
2011/11/15 Ross Moore ross.mo...@mq.edu.au:
 Hi Zdenek,

 On 16/11/2011, at 8:58 AM, Zdenek Wagner wrote:

 2011/11/15 Ross Moore ross.mo...@mq.edu.au:

 On 16/11/2011, at 5:56 AM, Herbert Schulz wrote:

 Given that TeX (and XeTeX too) deal wit a non-breakble space already 
 (where we usually use the ~ to represent that space) it seems to me that 
 XeTeX should treat that the same way.

 No, I disagree completely.

 What if you really want the Ux00A0 character to be in the PDF?
 That is, when you copy/paste from the PDF, you want that character
 to come along for the ride.

 From the typographical point of view it is the worst of all possible
 methods. If you really wish it,

 The *really wish it* is the choice of the author, not the
 software.

 then do not use TeX but M$ Word or
 OpenOffice. M$ Word automatically inserts nonbreakable spaces at some
 points in the text written in Czech. As far as grammer is concerned,
 it is correct. However, U+00a0 is fixed width. If you look at the
 output, the nonbreakable spaces are too wide on some lines and too
 thin on other lines. I cannot imagine anything uglier.

 I do not disagree with you that this could be ugly.
 But that is not the point.

 If you want superior aesthetic typesetting, with nice choices
 for hyphenation, then don't use Ux00A0. Of course!


 Whatever the reason for wanting to use this character, there
 should be a straight-forward way to do it.
 Using the character itself is:
  a.  the most understandable
  b.  currently works
  c.  requires no special explanation.

These are reasons why people might wish it in the source files, not in PDF.

If you wish to take a [part of] PDF and include it in another PDF as
is, you can take the PDF directly without the need of grabbing the
text. If you are interested in the text that will be retypeset, you
have to verify a lot of other things. If the text contained hyphenated
words, you have to join the parts manually. You will have a lot of
other work and the time saved by U+00a0 will be negligible. There are
tools that may help you to insert nonbreakable spaces. I have even my
own special tools written in perl to handle one class of input files
that are really plain texts and the result is (almost) correctly
marked LaTeX source.



 --
 Zdeněk Wagner
 http://hroch486.icpf.cas.cz/wagner/
 http://icebearsoft.euweb.cz

 Cheers,

        Ross

 
 Ross Moore                                       ross.mo...@mq.edu.au
 Mathematics Department                           office: E7A-419
 Macquarie University                             tel: +61 (0)2 9850 8955
 Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
 






 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Ross Moore
Hi Phil,

On 16/11/2011, at 8:45 AM, Philip TAYLOR wrote:

 Ross Moore wrote:
 
 On 16/11/2011, at 5:56 AM, Herbert Schulz wrote:
 
 Given that TeX (and XeTeX too) deal wit a non-breakble space already (where 
 we usually use the ~ to represent that space) it seems to me that XeTeX 
 should treat that the same way.
 
 No, I disagree completely.
 
 What if you really want the Ux00A0 character to be in the PDF?
 That is, when you copy/paste from the PDF, you want that character
 to come along for the ride.
 
 I'm not sure I entirely go along with this argument, Ross.
 What if you really want the \ character to be in the PDF,
 or the ^ character, or the $ character, or any character
 that TeX currently treats specially ?  

TeX already provides \$ \_ \# etc. for (most of) the other special
characters it uses, but does not for ^^A0 --- but it does not
need to if you can generate it yourself on the keyboard.


 Whilst I can agree
 that there is considerable merit in extending XeTeX such
 that it treats all of these new, special characters
 specially (by creating new catcodes, new node types and so
 on), in the short term I can see no fundamental problem with
 treating U+00A0 in such a way that it behaves indistinguishably
 from the normal expansion of ~.

How do you explain to somebody the need to do something really,
really special to get a character that they can type, or copy/paste?

There is no special role for this character in other vital aspects 
of how TeX works, such as there is for $ _ # etc.


 
 In TeX ~ *simulates* a non-breaking space visually, but there is
 no actual character inserted.
 
 And I don't agree that a space is a character, non-breaking or not !

In this view you are against most of the rest of the world.

If the output is intended to be PDF, as it really has to be with 
XeTeX, then the specifications for the modern variants of PDF 
need to be consulted.

With PDF/A and PDF/UA and anything based on ISO-32000 (PDF 1.7)
there is a requirement that the included content should explicitly
provide word boundaries. Having a space character inserted is by
far the most natural way to meet this specification.
(This does not mean that having such a character in the output
need affect TeX's view of typesetting.)

Before replying to anything in the above paragraph, please
watch the video of my recent talk at TUG-2011.

  http://river-valley.tv/further-advances-toward-tagged-pdf-for-mathematics/

or similar from earlier years where I also talk a bit about such things.

 
 ** Phil.


Hope this helps,

Ross


Ross Moore   ross.mo...@mq.edu.au 
Mathematics Department   office: E7A-419  
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia  2109  fax: +61 (0)2 9850 8114







--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Zdenek Wagner
2011/11/15 Ross Moore ross.mo...@mq.edu.au:
 Hi Phil,

 On 16/11/2011, at 8:45 AM, Philip TAYLOR wrote:

 Ross Moore wrote:

 On 16/11/2011, at 5:56 AM, Herbert Schulz wrote:

 Given that TeX (and XeTeX too) deal wit a non-breakble space already 
 (where we usually use the ~ to represent that space) it seems to me that 
 XeTeX should treat that the same way.

 No, I disagree completely.

 What if you really want the Ux00A0 character to be in the PDF?
 That is, when you copy/paste from the PDF, you want that character
 to come along for the ride.

 I'm not sure I entirely go along with this argument, Ross.
 What if you really want the \ character to be in the PDF,
 or the ^ character, or the $ character, or any character
 that TeX currently treats specially ?

 TeX already provides \$ \_ \# etc. for (most of) the other special
 characters it uses, but does not for ^^A0 --- but it does not
 need to if you can generate it yourself on the keyboard.

00a0

 Whilst I can agree
 that there is considerable merit in extending XeTeX such
 that it treats all of these new, special characters
 specially (by creating new catcodes, new node types and so
 on), in the short term I can see no fundamental problem with
 treating U+00A0 in such a way that it behaves indistinguishably
 from the normal expansion of ~.

 How do you explain to somebody the need to do something really,
 really special to get a character that they can type, or copy/paste?

 There is no special role for this character in other vital aspects
 of how TeX works, such as there is for $ _ # etc.



 In TeX ~ *simulates* a non-breaking space visually, but there is
 no actual character inserted.

 And I don't agree that a space is a character, non-breaking or not !

 In this view you are against most of the rest of the world.

TeX NEVER outputs a space as a glyph. Text extraction tools usually
interpret horizontal spaces of sufficient size as U+0020.

(The exception to the above mentioned never is the verbatim mode.)

 If the output is intended to be PDF, as it really has to be with
 XeTeX, then the specifications for the modern variants of PDF
 need to be consulted.

 With PDF/A and PDF/UA and anything based on ISO-32000 (PDF 1.7)
 there is a requirement that the included content should explicitly
 provide word boundaries. Having a space character inserted is by
 far the most natural way to meet this specification.

A space character is a fixed-width glyph. If you insist in it, you
will never be able to typeset justified paragraphs, you will move back
to the era of mechanical typewriters.

 (This does not mean that having such a character in the output
 need affect TeX's view of typesetting.)

 Before replying to anything in the above paragraph, please
 watch the video of my recent talk at TUG-2011.

  http://river-valley.tv/further-advances-toward-tagged-pdf-for-mathematics/

 or similar from earlier years where I also talk a bit about such things.


 ** Phil.


 Hope this helps,

        Ross

 
 Ross Moore                                       ross.mo...@mq.edu.au
 Mathematics Department                           office: E7A-419
 Macquarie University                             tel: +61 (0)2 9850 8955
 Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
 






 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Karljurgen Feuerherm
I was going to make the following point earlier--maybe in light of
Phil's conclusion I should do it now.

There seems to be a tendency not to distinguish between a(n orginal)
character in the sense of character of a writing system, and a computer
character.

The former are visible symbols on a background medium. The latter are
an entirely different set of symbols which to some extent parallel the
former, and some extent do not. Space, control codes, etc. don't exist
in the former, but exist in the latter because it was a convenient way
to encode certain functions one wished to apply to the encoded other
characters--the ones that correspond more or less to original writing
system characters.

These encoding sets have developed over time, and have consequently
inherited all sorts of legacy issues, not all of which need supporting.
Unicode provides tools. No one says one has to use them all.

Specifically, the purpose of XeTeX and other such engines is to all for
the nice typographical formatting of visual representations of script
characters against some other defined background. From that point of
view, so long as it does it, once it does it, it has achieved its goal.

Transparency of all sorts of other things, providing input via PDF to
other software isn't and shouldn't be a *primary* goal.

That being said, no doubt it might be helpful to some to have this or
that control character passed along. But that's not the essence of the
exercise, and should only be done if it can be done cheaply, i.e.
without a lot of risk to the primary objective.

I guess the real question is that latter part.

K

 On Tue, Nov 15, 2011 at  4:45 PM, in message
4ec2dd63.3040...@rhul.ac.uk,
Philip TAYLOR p.tay...@rhul.ac.uk wrote:


 Ross Moore wrote:

 On 16/11/2011, at 5:56 AM, Herbert Schulz wrote:

 Given that TeX (and XeTeX too) deal wit a non-breakble space
already (where
 we usually use the ~ to represent that space) it seems to me that
XeTeX
 should treat that the same way.

 No, I disagree completely.

 What if you really want the Ux00A0 character to be in the PDF?
 That is, when you copy/paste from the PDF, you want that character
 to come along for the ride.

 I'm not sure I entirely go along with this argument, Ross.
 What if you really want the \ character to be in the PDF,
 or the ^ character, or the $ character, or any character
 that TeX currently treats specially ?  Whilst I can agree
 that there is considerable merit in extending XeTeX such
 that it treats all of these new, special characters
 specially (by creating new catcodes, new node types and so
 on), in the short term I can see no fundamental problem with
 treating U+00A0 in such a way that it behaves indistinguishably
 from the normal expansion of ~.

 In TeX ~ *simulates* a non-breaking space visually, but there is
 no actual character inserted.

 And I don't agree that a space is a character, non-breaking or not !

 ** Phil.


 --
 Subscriptions, Archive, and List information, etc.:
   http://tug.org/mailman/listinfo/xetex



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Ross Moore
Hi Phil,

On 16/11/2011, at 10:08 AM, Zdenek Wagner wrote:

 How do you explain to somebody the need to do something really,
 really special to get a character that they can type, or copy/paste?
 
 There is no special role for this character in other vital aspects
 of how TeX works, such as there is for $ _ # etc.
 
 
 
 In TeX ~ *simulates* a non-breaking space visually, but there is
 no actual character inserted.
 
 And I don't agree that a space is a character, non-breaking or not !
 
 In this view you are against most of the rest of the world.
 
 TeX NEVER outputs a space as a glyph. Text extraction tools usually
 interpret horizontal spaces of sufficient size as U+0020.

I never said that it did, nor that it was necessary to do so.

Those text extraction tools do a pretty reasonable job, but don't
always get it right. Besides, there is reliance on a heuristic,
which can be fallible, especially if there is content typeset in 
a very small font size.
And what about at line-ends? They can get that wrong too.

Such a reliance is rather against the TeX way of doing things,
don't you think?

Better is for TeX itself to apply the heuristic, since it knows
the current font size and the separation between bits of words.

 (The exception to the above mentioned never is the verbatim mode.)

That isn't good enough for TeX to produce PDF/A.
Go and watch the videos that I pointed you to.


Lower down I give a run-down of how a variant of TeX handles
this problem, to very good effect.

 
 If the output is intended to be PDF, as it really has to be with
 XeTeX, then the specifications for the modern variants of PDF
 need to be consulted.
 
 With PDF/A and PDF/UA and anything based on ISO-32000 (PDF 1.7)
 there is a requirement that the included content should explicitly
 provide word boundaries. Having a space character inserted is by
 far the most natural way to meet this specification.
 
 A space character is a fixed-width glyph. If you insist in it, you
 will never be able to typeset justified paragraphs, you will move back
 to the era of mechanical typewriters.

Absolutely wrong!

I'm not insisting on it being included as the natural way to 
separate words within the PDF, though it certainly is a possible
way that is used by other software.

 (This does not mean that having such a character in the output
 need affect TeX's view of typesetting.)

Clearly you never even read this parenthetical statement ...

 
 Before replying to anything in the above paragraph, please
 watch the video of my recent talk at TUG-2011.

 ... and certainly you don't seem to have followed up on this
piece of advice, to get a better perspective of what I'm talking
about.

 
  http://river-valley.tv/further-advances-toward-tagged-pdf-for-mathematics/
 
 or similar from earlier years where I also talk a bit about such things.



Here is how you get *both* TeX-quality typesetting and explicit
spaces as word-boundaries inside the PDF, with no loss of quality.

What the experimental tagged-pdfTeX does is to use a font (called
dummy-space) that contains just a single character at code Ux0020,
at a size that is almost zero -- it cannot be exactly zero, else 
PDF browsers may not select it for copy/paste, or other text-extraction.

These extra spaces are inserted into the PDF content stream, *after*
TeX has determined the correct positioning for high-quality typesetting.
That is, it is *not* done by macros or widgets or suchlike, but is
done internally by the pdfTeX engine at shipout time.

The almost-zero size has no perceptible effect on the visual output.
But the existence of these extra space characters means that all
text-extraction methods work much more reliably.

There *are* extra primitives that can be used to turn this off and on
in places where such extra spaces are not wanted; e.g. in math.
And there is a primitive to insert such a space, in case it is required
manually, for whatever reason. All of these primitives are used
extensively when generating tagged PDF of mathematical expressions,
and are thus available for other usage too.


 
 
 ** Phil.

Hope this helps,

Ross


Ross Moore   ross.mo...@mq.edu.au 
Mathematics Department   office: E7A-419  
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia  2109  fax: +61 (0)2 9850 8114







--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Zdenek Wagner
2011/11/16 Ross Moore ross.mo...@mq.edu.au:

 On 16/11/2011, at 9:45 AM, Zdenek Wagner wrote:

 2011/11/15 Ross Moore ross.mo...@mq.edu.au:

 What if you really want the Ux00A0 character to be in the PDF?
 That is, when you copy/paste from the PDF, you want that character
 to come along for the ride.

 From the typographical point of view it is the worst of all possible
 methods. If you really wish it,

 Maybe you misunderstood what I meant here.

 I'm not saying that you might want Ux00A0 for *every* place
 where there is a word-breaking space.
 Just that there may be individual instance(s) where you have
 a reason to want it.

 Just like any other Unicode character, if you want it then
 you should be able to put it in there.

You ARE able to do it. Choose a font with that glyph, set \catcode to
11 or 12 and that's it. What else do you wish to do?

 That's what XeTeX currently does (with the TeX-wise familiar
 ASCII exceptions) for any code-point supported by the
 chosen font.


 The *really wish it* is the choice of the author, not the
 software.

 then do not use TeX but M$ Word or
 OpenOffice. M$ Word automatically inserts nonbreakable spaces at some
 points in the text written in Czech. As far as grammer is concerned,
 it is correct. However, U+00a0 is fixed width. If you look at the
 output, the nonbreakable spaces are too wide on some lines and too
 thin on other lines. I cannot imagine anything uglier.

 I do not disagree with you that this could be ugly.
 But that is not the point.

 If you want superior aesthetic typesetting, with nice choices
 for hyphenation, then don't use Ux00A0. Of course!


 Whatever the reason for wanting to use this character, there
 should be a straight-forward way to do it.
 Using the character itself is:
  a.  the most understandable
  b.  currently works
  c.  requires no special explanation.

 These are reasons why people might wish it in the source files, not in PDF.

 Yes. In the source, to have the occasional such character included
 within the PDF, for whatever reason appropriate to the material
 being typeset -- whether verbatim, or not.


 If you wish to take a [part of] PDF and include it in another PDF as
 is, you can take the PDF directly without the need of grabbing the
 text. If you are interested in the text that will be retypeset, you
 have to verify a lot of other things.

 How is any of this relevant to the current discussion?

It was you who came with the argument that you wish to have
nonbreakable spaces when copying the text from PDF.

 If the text contained hyphenated
 words, you have to join the parts manually. You will have a lot of
 other work and the time saved by U+00a0 will be negligible. There are
 tools that may help you to insert nonbreakable spaces. I have even my
 own special tools written in perl to handle one class of input files
 that are really plain texts and the result is (almost) correctly
 marked LaTeX source.

 All well and good.
 But how is that relevant to anything I said?

See above.



 --
 Zdeněk Wagner
 http://hroch486.icpf.cas.cz/wagner/
 http://icebearsoft.euweb.cz


 Cheers,

        Ross

 
 Ross Moore                                       ross.mo...@mq.edu.au
 Mathematics Department                           office: E7A-419
 Macquarie University                             tel: +61 (0)2 9850 8955
 Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
 






 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex




-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Philip TAYLOR



Ross Moore wrote:

Hi Phil,

On 16/11/2011, at 10:08 AM, Zdenek Wagner wrote:


Not I, Sir : Zdeněk  !
** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


[XeTeX] aligning characters at their centers

2011-11-15 Thread Daniel Greenhoe
Is there a way to align characters at their centers instead of at
their baselines?

Take for example
   {\scshape Ee}.
This will produce one big uppercase E and one little uppercase E;
and their lower horizontal bars will be aligned. But is there any way
I can make them aligned at their centers (center horizontal bars
aligned) without using \raisebox?

This has application to book publishing when placing rotated text on
the spine of a book.

Many thanks in advance,
Dan


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] aligning characters at their centers

2011-11-15 Thread Heiko Oberdiek
On Wed, Nov 16, 2011 at 11:28:33AM +0800, Daniel Greenhoe wrote:

 Is there a way to align characters at their centers instead of at
 their baselines?
 
 Take for example
{\scshape Ee}.
 This will produce one big uppercase E and one little uppercase E;
 and their lower horizontal bars will be aligned. But is there any way
 I can make them aligned at their centers (center horizontal bars
 aligned) without using \raisebox?

\documentclass{article}
\begin{document}
\scshape
$\vcenter{\hbox{E}}\vcenter{\hbox{e}}$
or
\valign{\vfill\hbox{#}\vfill\cr E\cr e\cr}
\end{document}

Yours sincerely
  Heiko Oberdiek


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-15 Thread Ross Moore
Hi Zdenek,

On 16/11/2011, at 11:19 AM, Zdenek Wagner wrote:

 Just like any other Unicode character, if you want it then
 you should be able to put it in there.
 
 You ARE able to do it. Choose a font with that glyph, set \catcode to
 11 or 12 and that's it. What else do you wish to do?

The *default* behaviour should stay as this.
Any other behaviour needs to change the catcode
and make perhaps a definition.

 These are reasons why people might wish it in the source files, not in PDF.
 
 Yes. In the source, to have the occasional such character included
 within the PDF, for whatever reason appropriate to the material
 being typeset -- whether verbatim, or not.


 If you wish to take a [part of] PDF and include it in another PDF as
 is, you can take the PDF directly without the need of grabbing the
 text. If you are interested in the text that will be retypeset, you
 have to verify a lot of other things.
 
 How is any of this relevant to the current discussion?
 
 It was you who came with the argument that you wish to have
 nonbreakable spaces when copying the text from PDF.

No. I said that if you put one in, then you should be
expecting to get one out.
This should be the default behaviour, as it is now.

I certainly suggested nothing like getting out non-breaking
spaces as a replacement for anything else.


 Zdeněk Wagner
 http://hroch486.icpf.cas.cz/wagner/
 http://icebearsoft.euweb.cz



Hope this helps,

Ross


Ross Moore   ross.mo...@mq.edu.au 
Mathematics Department   office: E7A-419  
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia  2109  fax: +61 (0)2 9850 8114







--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex