On 07/12/2012 06:46, patspiper wrote:
On 07/12/12 00:53, Martin wrote:
A while ago, I started adding support for mixed LTR/RTL text in
SynEdit.
The actual display of RTL text now works (that is, if you have some
arabic chars in the text, they display RTL, and the caret moves
accordingly / caret between RTL and LTR always means caret at LTR).
uf8 LTR/RTL markers are not supported. This is absolute basics only.
This is ok in cases like the IDE editor where the document is mainly
English. I suppose it will be somehow odd for documents with mainly
RTL languages. Formatting (like indentation, bullets), where
implemented, will suffer.
Unfortunately with RTL came other unicode features, that sofar no one
had missed. Those are at the very least
- combining codepoints
- ligatures
- maybe reordering of codepoints.
- other?
They are tasks of different extent. And I need to find out what is
mandatory, and what optional. So I can then decide, what does fit
into my schedule.
The current state is:
- combining: Only Arabic has been done (but they should be complete).
So none Arabic RTL will not work.
- ligatures: see below
- reordering: not researched, hopefully optional.
I am not aware of any need for reordering.
"work"
means, that the text is stable (except ligatures, only with
workaround), and does not expand/shrink, when selecting text, or
moving the caret. Also that the caret will be at the correct pos. A
newly inserted char will be where the caret was. Can be tested by
hitting the "end" key, and see if the caret is at the end of visual
text. If SynEdit thinks the text is shorter/longer than the actual
painted display, then there is an issue.
ligatures:
The editor does not handle ligatures yet. So it calculates 2 screen
cells, when only one is needed. However a stable "workaround" exists
(currently depends on config)
On windows and windows only (others will be done, if that turns out
to be any good). In Options / Editor / Display / set "Extra CHAR
spacing" to 1
This will slightly widen the script, ignore that, its temporary.
Requires a proper monospaced font. (Deja vu mono)
What it will do: It will tell windows, that the ligature is expected
to cover 2 display cells.
Display: Arabic text is a script, glyphs are connected by a
continuous line. The ligature will be in one cell, the next cell will
be empty, except for the connecting line.
Editing: The caret can be at either cell. Each cell stands for one of
the 2 chars in the ligature. So the 2nd char can be edited, if the
caret is at the empty cell
------------------
I need feedback from people who actually speak (or at least read and
write) Arabic. I need to know, if the above situation is "useable".
If so, then:
- it can be fixed to work without the extra char spacing
- on gtk, carbon, qt (well at least I hope)
- combining can be added for other languages.
If not, well I don't know yet.
I have tested on Linux/gtk2 (ubuntu 11.04), and courier new only:
- The attached snapshot (lines 29 and 30) shows an extra space before
the 456.
Did you use "Extra Char Spacing" = 1 ? This is what happens, if not!
(This and a few other real oddities)
And also, it can only be tested on windows. Because on GTK,QT,Carbon
"Extra Char Spacing" is faulty in an other way: It splits the combining
chars into individuals, but since SynEdit does not know.....
The problem is, that by current design, SynEdit has to calculate the
pixel pos of each char on it's own.If it does not calculate the same, as
the OS did when painting (SymEdit gives the OS tokens, fragments of the
line or the whole line) then obviously things will be odd afterwards.
- Long connecting lines are not what I would like, but this is a
monospaced font afterall.
Ok, but can you test them on windows, with "Extra Char Spacing" = 1
See that the caret pos is treadet correct, backspace and delete, insert
work (on the correct char) on them, Copying a selection will copy the
highlighted part (except column mode selection, which is not done yet)
About editing. (backspace and delete, insert)
- combining chars see below.
- ligatures. Caret and selection-wise the ligature, and the
long-connecting-line, are both treaded as one char. One is the 1st, the
other the 2nd char of the ligature (in the order they occur in text).
The behaviour for editing should reflect this. Does it.
- The 456 should have come to the left of the Arabic words.
Ok, that could dbe fixed. Depends on treating digits as weak or strong
LTR. Actually in this case, depends on treating the line end as such)
If the 456 were embedded in the middle of arab, it would have worked.
But they border the EOL, and SynEdit treats the EOL strong LTR (and
bordering weak 456 follows). This gives better result for pascal, where
Arab occurs in strings. "a:='arab';" The '; in the end will and should
be LTR due to bordering the EOL.
This will be fixed eventually, when weak handling is made highlighter
depending
- If you put a shaddah or damma on a character, it gets displayed on
top of the character (correct behaviour). Pressing backspace at this
stage should only delete that addition, and not the character.
Ok, Also simple to fix. Not a painting issue so.
Those are combining codepoints. So backspace must act on codepoints.
The editor understands the diff between "Char" and "codepoint". It is a
question of assigning the right choice to each action (and that is a
question of writing testcases too)
----------------------
About "Long connecting lines are not what I would like."...
I understand. And it would not be the final solution. But if all else
works (as described above) then this is a solution, that I believe, I
can reach without too much extra work from where I am now (Will still be
next year...).
And then we had something at least use-able.
The rest will be on my todo list, and has to await it's time, between
other features and debugger.
--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus