Re: [PATCH] Output of en- and em-dashes

2017-03-19 Thread Guenter Milde
On 2017-03-07, Enrico Forestieri wrote:

> The attached patch fixes the regression introduced in 2.2 about the
> output of en- and em-dashes. 
...
> With this patch, documents produced with older versions work again
> as intended 

Not always:

The proposed patch restores the previous behaviour for the subset of
pre-2.2 documents that used ligature dashes (see Details below).

OTOH, the patch will lead to changed output for older documents using
literal EM DASH and EN DASH characters (that were not affected by the
changes in 2.2) as well as 2.2 documents.

If we are going this way, I propose to make the "ligature dash" output an
opt-in. Otherwise, we replace one evil by another - the LyX update again
causing unwanted changes for existing documents.


More problems with the proposed patch:

a) the setting is lost when converting to 2.2

b) There are older documents using literal dash + ZWSP (U+200b)
   for dash with optional line break point
   (https://marc.info/?l=lyx-users=140982011101908=2)
   With the attached patch, the ZWSP will be removed leading to an unwelcome
   surprise. 



Details
===

In versions < 2.2, there were two methods to input and store em- and
en-dashes:

a) ligature input (--- and --)
b) literal EM DASH and EN DASH characters (0x2014 and 0x2013).

While both methods produce the same characters in the output, they behave
differently regarding possible hyphenation of the preceding word and line
break after the dash. Depending on the use case, both methods have
advantages and problems (see
http://www.lyx.org/trac/raw-attachment/ticket/10543/dash-problems.lyx).

Conversion from 2.1 to 2.2 eliminates the difference between ligature dashes
and literal dashes. In 2.2, dashes are always stored as EM DASH and EN DASH
characters.

So, we have 2 problems:

a) Changed LaTeX export for documents using ligature dashes leading to
   different output in some cases.
   
b) Loss of information during the conversion process.


Alternatives


Further information loss (problem b) can be avoided with a change to
Text.cpp which ensures the distinction of ligature vs. literal dash is kept
during the conversion. Of course, this cannot restore lost information if
converted documents are already modified and saved with 2.2.

--- a/src/Text.cpp
+++ b/src/Text.cpp
@@ -506,9 +506,11 @@ void Text::readParToken(Paragraph & par, Lexer & lex,
par.insert(par.size(), from_ascii("---"), font, 
change);
} else {
if (token == "\\twohyphens")
-   par.insertChar(par.size(), 0x2013, font, 
change);
-   else
-   par.insertChar(par.size(), 0x2014, font, 
change);
+   par.insertChar(par.size(), 0x2013, font, 
change); // EN DASH
+   else {
+   par.insertChar(par.size(), 0x2014, font, 
change); // EM DASH
+   par.insertChar(par.size(), 0x200b, font, 
change); // ZWSP
+   }
}
} else if (token == "\\backslash") {
par.appendChar('\\', font, change);



Alternatively to a buffer setting, we could also take up the suggestion to
define the "ligature dashes" as "special characters":

+1 similar to current support for typographical quotes (special char
   parallel to literal Unicode)
   
+1 enables use of ligature dashes and literal dashes in one document

+1 lyx2lyx conversion of 2.1 and 2.2 documents without behaviour change:
   * replace \twohyphens and \threehyphens with
 dash-special-chars in Text.cpp.
   * keep literal dashes.
   
-1 two competing ways to represent dashes   


Günter



Re: [PATCH] Output of en- and em-dashes

2017-03-19 Thread Enrico Forestieri
On Sat, Mar 18, 2017 at 05:32:01PM +0100, Enrico Forestieri wrote:
> On Sat, Mar 18, 2017 at 03:41:20PM +, Guenter Milde wrote:
> > On 2017-03-18, Enrico Forestieri wrote:
> > > On Sat, Mar 18, 2017 at 03:06:09PM +0100, Guillaume Munch wrote:
> > 
> > ...
> > 
> > >> > I think we have to make do with the ugly zero-space inset.
> > 
> > >> Or special/invisible unicode characters could be made visible by changing
> > >> the character before painting.
> > 
> > > However, this is going to be an issue only when exporting a document
> > > to previous versions and editing it with that version. So, future
> > > possible workarounds are not going to help.
> > 
> > > It would not be an issue if that character could be searched for, but
> > > this is not possible either. 
> > 
> > How did you test this?
> > I could find a text with ZWSP, if I inserted the ZWSP into the search box
> > from Insert>Special Char>Symbols.
> 
> I mean when searching for a single ZWSP. The "simple" search does not
> work because the find next button is not activated,

This is strange. I found out that copying a ZWSP from the lyx window
to the "simple" search box does not work, but copying it from the
Special Character Symbols (or from an external source) does work.
So, it can be actually searched for and thus I am going to commit the
corresponding patch.

-- 
Enrico


Re: [PATCH] Output of en- and em-dashes

2017-03-18 Thread Enrico Forestieri
On Sat, Mar 18, 2017 at 03:41:20PM +, Guenter Milde wrote:
> On 2017-03-18, Enrico Forestieri wrote:
> > On Sat, Mar 18, 2017 at 03:06:09PM +0100, Guillaume Munch wrote:
> 
> ...
> 
> >> > I think we have to make do with the ugly zero-space inset.
> 
> >> Or special/invisible unicode characters could be made visible by changing
> >> the character before painting.
> 
> > However, this is going to be an issue only when exporting a document
> > to previous versions and editing it with that version. So, future
> > possible workarounds are not going to help.
> 
> > It would not be an issue if that character could be searched for, but
> > this is not possible either. 
> 
> How did you test this?
> I could find a text with ZWSP, if I inserted the ZWSP into the search box
> from Insert>Special Char>Symbols.

I mean when searching for a single ZWSP. The "simple" search does not
work because the find next button is not activated, while the "advanced"
search works only when deselecting "ignore format" and placing an empty
ERT just after the ZWSP. Even if in this case the ZWSP is actually found,
nothing is selected, so that you don't have any visual clue.
The way you insert a ZWSP is irrelevant (I used unicode-insert 0x200b).

> OTOH, copying text with ZWSP from a text editor into LyX failed as well as
> copying text with ZWSP form LyX's buffer window into the search box.

BUt copy/paste of a single ZWSP works in LyX. Even if you don't see
anything highlighted, the scissors and the copy icons get activated.
So, you could copy the word just after an em-dash without realizing
that you are also copying a ZWSP.

> > And even fixing this is not going to help
> > with previous lyx versions.
> 
> True. Therefore, also the "Special character allowbreak" patch will not help
> here. It would, however help with documents transformed from 2.1 to 2.3.

In fact, the issue here is exporting to previous formats and editing
with the corresponding lyx version.

If this is deemed a minor glitch, I will be happy to use the ZWSP.

-- 
Enrico


Re: [PATCH] Output of en- and em-dashes

2017-03-18 Thread Guenter Milde
On 2017-03-18, Enrico Forestieri wrote:
> On Sat, Mar 18, 2017 at 03:06:09PM +0100, Guillaume Munch wrote:

...

>> > I think we have to make do with the ugly zero-space inset.

>> Or special/invisible unicode characters could be made visible by changing
>> the character before painting.

> However, this is going to be an issue only when exporting a document
> to previous versions and editing it with that version. So, future
> possible workarounds are not going to help.

> It would not be an issue if that character could be searched for, but
> this is not possible either. 

How did you test this?
I could find a text with ZWSP, if I inserted the ZWSP into the search box
from Insert>Special Char>Symbols.
OTOH, copying text with ZWSP from a text editor into LyX failed as well as
copying text with ZWSP form LyX's buffer window into the search box.

> And even fixing this is not going to help
> with previous lyx versions.

True. Therefore, also the "Special character allowbreak" patch will not help
here. It would, however help with documents transformed from 2.1 to 2.3.

Günter




Re: [PATCH] Output of en- and em-dashes

2017-03-18 Thread Enrico Forestieri
On Sat, Mar 18, 2017 at 03:06:09PM +0100, Guillaume Munch wrote:

> Le 18/03/2017 à 14:59, Enrico Forestieri a écrit :
> > On Wed, Mar 15, 2017 at 06:12:51PM +0100, Enrico Forestieri wrote:
> > > 
> > > Apparently, nobody has a preference, so I am going to commit the
> > > second patch, i.e., the one using a zero-width space character.
> > 
> > On second thoughts, I am not sure this is the best choice. I just
> > verified that this character is not searchable and only previewing
> > the latex source code can reveal its presence. So, it may be
> > inadvertently spread into a document by copy/paste.
> > 
> > I think we have to make do with the ugly zero-space inset.
> > 
> 
> Or special/invisible unicode characters could be made visible by changing
> the character before painting.

However, this is going to be an issue only when exporting a document
to previous versions and editing it with that version. So, future
possible workarounds are not going to help.

It would not be an issue if that character could be searched for, but
this is not possible either. And even fixing this is not going to help
with previous lyx versions.

-- 
Enrico


Re: [PATCH] Output of en- and em-dashes

2017-03-18 Thread Guillaume Munch

Le 18/03/2017 à 14:59, Enrico Forestieri a écrit :

On Wed, Mar 15, 2017 at 06:12:51PM +0100, Enrico Forestieri wrote:


Apparently, nobody has a preference, so I am going to commit the
second patch, i.e., the one using a zero-width space character.


On second thoughts, I am not sure this is the best choice. I just
verified that this character is not searchable and only previewing
the latex source code can reveal its presence. So, it may be
inadvertently spread into a document by copy/paste.

I think we have to make do with the ugly zero-space inset.



Or special/invisible unicode characters could be made visible by 
changing the character before painting.




Re: [PATCH] Output of en- and em-dashes

2017-03-18 Thread Enrico Forestieri
On Wed, Mar 15, 2017 at 06:12:51PM +0100, Enrico Forestieri wrote:
> 
> Apparently, nobody has a preference, so I am going to commit the
> second patch, i.e., the one using a zero-width space character.

On second thoughts, I am not sure this is the best choice. I just
verified that this character is not searchable and only previewing
the latex source code can reveal its presence. So, it may be
inadvertently spread into a document by copy/paste.

I think we have to make do with the ugly zero-space inset.

-- 
Enrico


Re: [PATCH] Output of en- and em-dashes

2017-03-17 Thread Guenter Milde
On 2017-03-07, Enrico Forestieri wrote:

> [-- Type: text/plain, Encoding:  --]

> The attached patch fixes the regression introduced in 2.2 about the
> output of en- and em-dashes. In 2.2 en- and em-dashes are output as
> the \textendash and \textemdash macros, causing changed output in
> old documents and also bugs (for example, #10490).

> With this patch, documents produced with older versions work again
> as intended, while documents produced with 2.2 can be made to produce
> the exact same output by simply checking "Don't use ligatures for en-
> and em-dashes" in Document->Settings->Fonts.

> Actually, I am attaching two patches. They differ only in the way
> documents are exported to earlier versions. If one wants to use
> ligatures for en/em-dashes, in order to not cause changed output,
> a zero-width space inset is inserted after each en/em-dash when
> using the first patch, while the second patch inserts a zero-width
> space character (U+200B). Both are removed when reloading  documents
> with 2.3, so that they don't accumulate.

> The second patch produces more visually pleasant documents, as the
> zero-width space character is invisible on screen, but they work
> OOTB only when exporting to 2.1 at most. This is because 2.0 and
> earlier versions don't define U+200B in the unicodesymbols file.
> However it could be manually added there.




Re: [PATCH] Output of en- and em-dashes

2017-03-17 Thread Enrico Forestieri
On Tue, Mar 07, 2017 at 11:48:41AM +0100, Enrico Forestieri wrote:
> The attached patch fixes the regression introduced in 2.2 about the
> output of en- and em-dashes. In 2.2 en- and em-dashes are output as
> the \textendash and \textemdash macros, causing changed output in
> old documents and also bugs (for example, #10490).
> 
> With this patch, documents produced with older versions work again
> as intended, while documents produced with 2.2 can be made to produce
> the exact same output by simply checking "Don't use ligatures for en-
> and em-dashes" in Document->Settings->Fonts.
> 
> Actually, I am attaching two patches. They differ only in the way
> documents are exported to earlier versions. If one wants to use
> ligatures for en/em-dashes, in order to not cause changed output,
> a zero-width space inset is inserted after each en/em-dash when
> using the first patch, while the second patch inserts a zero-width
> space character (U+200B). Both are removed when reloading  documents
> with 2.3, so that they don't accumulate.
> 
> The second patch produces more visually pleasant documents, as the
> zero-width space character is invisible on screen, but they work
> OOTB only when exporting to 2.1 at most. This is because 2.0 and
> earlier versions don't define U+200B in the unicodesymbols file.
> However it could be manually added there.

Apparently, nobody has a preference, so I am going to commit the
second patch, i.e., the one using a zero-width space character.

-- 
Enrico