cp1252.txt and can hang

Duncan Gibson Mon, 06 Dec 2010 00:28:06 -0800

DO NOT REPLY TO THIS MESSAGE.  INSTEAD, POST ANY RESPONSES TO THE LINK BELOW.

[STR New]

Link: http://www.fltk.org/str.php?L2348
Version: 1.3-current

Manolo wrote:
> It seems difficult to support all 8-bit encodings because there were
> so many, (this fact is the most compelling justification for UTF).
> However, the case for supporting CP152 as input has been considered
> valid in recent discussions here because a large number of legacy
> files use this encoding.
> ...
> proposal + read_options enum

Is there no way that we can provide character set readers via a plugin
that also encapsulates an identifier rather than hard-code these using
an enum? The default/base plugin provided by the FLTK core would be
pure UTF-8. We could also possible provide "ISO-8859-1", "CP-1252" and
"MacRoman". User contributed plugins for other ISO-8859-* and Mac*
character sets could be provided via an additional library.

As I'm only thinking about 8-bit character sets at the moment, the
plugin would only need to provide a 256-entry lookup table from byte
to UTF-8 values. [Reverse mapping could be handled via Albrecht's
private range UTF-8 suggestion]. My initial idea was something like
the following, (typed off the top of my head, so likely to have errors)

class CharacterSet
{
  public:
    // return identifier string
    virtual const char* identifier() const;

    // returns ucs value for byte
    virtual unsigned int byteToUcs(unsigned char byte);

    // returns true if ucs can be mapped to single byte
    virtual bool ucsMapsToByte(unsigned int ucs);

    // returns byte value corresponding to ucs
    virtual unsigned char ucsToByte(unsigned int ucs);

    // returns true if input text required conversion
    virtual bool convert(const char* input, char* output, int *len);
};

The plugin would add its identifier to a CharacterSet registry so that
if could be searched, and offered via preferences, combo box, etc.

Not sure about the convert() method, as it could involve an extra
layer of string copying (not very Fast and Light) and also raises
questions abut who allocates/frees the output buffer.

Or maybe there could be a "fileRead" method that is passed a FILE*
and does the conversion on the fly? Again the interface is not clear.

To handle 16-bit encodings would obviously need a different API.
Maybe we could pass char* and let the plugin decode as appropriate.

Whatever scheme we introduce to solve this STR for FLTK-1.3.0, I
think we will probably need to change the API once we have gained
experience in its use.

Link: http://www.fltk.org/str.php?L2348
Version: 1.3-current

_______________________________________________
fltk-bugs mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-bugs

Re: [fltk.bugs] [HIGH] STR #2348: test/editor fails to display misc/cp1252.txt and can hang

Reply via email to