On 29 Mar 2008, at 10:56, matthiasm wrote:
>
> On 29.03.2008, at 10:42, imm wrote:
>> On 29 Mar 2008, at 8:04, matthiasm wrote:
>>> I am in the process of
>>> comparing the current 1.1.8 UTF-8 patch, Oksid's 1.1.6 patch, and
>>> whatever we did in 2.0 to find the best possible API and then merge
>>> it
>>> all together.
>>
>> Does this review also encompass my patched 1.1.8 version?
>
> Yes, that was the first one I was referring to. Am I missing yet
> another UTF-8 patch?

Oh right... Yes, there is another version, Mariwan did one, based on  
r5764 or something. Anyway, I think it came out around the same time  
my first drop did, but I don't know if he's been updating it or not.  
I never really looked at what he'd done though, 'cos by then I was  
heavily into getting my stuff to work...

Also, there's the WinCE port that Mikko_L posted back in June last  
year (was it that long ago?) - again, I haven't looked at it, but I  
think that "we" (in the wider sense, i.e. somebody, maybe even Mikko  
if he's around?) probably should look at it since it *must* have  
addressed wide-char issues to get it working at all on WinCE, and  
WinCE integration would be a Good Thing in the wider sense anyway.

> Your latest version is fltk118-utf8-2008-02-24, right?

Just checking... yes, that seems right. Which means I've missed most  
of the recent fixes, e.g. the work Yuri did... Do you want me to  
merge that and post a new version, or are you OK for now? Time might  
be a bit tricky for me this next two weeks - not that I'm trying to  
sway your decision there...! ;-)



> Yes, makes complete sense. From what I see so far, I can copy and
> paste a lot of things (if not all) from your version and some FLTK2.
> The worse fix will be to find all places that use "char" and "char*"
> and decide if a text string is handled or some binary data, and then
> update the code accordingly. UTF-8 being back-compatible to ASCII is
> great, but it is also a trap when forgetting to port some routines
> because they look right. Eventually they will fail.

Well, of course a utf-8 string is just a char*, but the problems  
arise where you have code that assumes that one byte in the string ==  
1 glyph on the display... A lot of ASCII derived code is making the  
assumption implicitly, but I *think* we have caught those cases in  
the fltk code base and replaced it with code that knows how to count  
glyphs in a well-formed utf-8 string. (I don't promise we've got that  
right, mind you!)

But...

If all we are doing is providing the means to *render* utf-8 encoded  
strings (which is what I think we should do, and is what the current  
code does), and we do not intend to provide the various layers above  
that to actually implement the various language rules, then we are  
probably OK... So long as the user composites the utf-8 string  
correctly for rendering, and sets the text direction, we can probably  
do the right thing.

But if we are going to implement the Unicode spec for language  
handling... well that way lies madness. Then we'd have to detect the  
text direction based on language, and handle diphthongs and elided  
characters and so on.
And there are quite a lot of languages where the canonical order in  
which the glyphs are stored in the string differs from the order in  
which they are displayed on the screen, depending on which characters  
are on each side, or where in the word the glyphs appears and etc...  
All that sort of stuff has to be handled somewhere, but NOT by fltk  
(maybe ICU, PanGo, others...)
Even simple stuff, like if the user types two 's' characters in a  
German locale, should both the 's' be rendered, or should we  
substitute a single ß instead... It gets tricky really fast! And the  
Unicode specs cover all of that.
I don't think we can.

Sorry. I'm probably ranting - none of this is relevant to what you  
are doing, I'm sure...

-- 
Ian




_______________________________________________
fltk mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk

Reply via email to