So I've spent some time writing a small and I think portable routine to
detect if a buffer is a valid UTF-16 (either little or big endian). It
rejects buffers if they contain an odd number of bytes or contain any of
the 66 non-character code-points or have invalid surrogate usage. While
this seems to work well on the handful of binary files I've tested it
against, I'm curious as to whether it would be desired to additionally use
some or all of the current binary file detection criteria? My thought being
that a file could be perfectly valid UTF-16 but have an extremely long line
or (worse in my opinion) embedded non-text characters (particularly U+0000
or non-white-space control codes).

SDR

On Thu, Sep 13, 2012 at 6:07 PM, Richard Hipp <[email protected]> wrote:

> You assume correctly.
>
> The use of iconv won't do, though, since everything also needs to work on
> Unix.  There are small, portable conversion routines in SQLite that you can
> copy.
>
> D. Richard Hipp - [email protected]
> Sent from phone - pardon brevity
>
> On Sep 13, 2012 7:44 PM, "Scott Robison" <[email protected]> wrote:
>
> I assumed (dangerous though it may be) that "leaves anything that isn't
> UTF-16 unchanged" meant "don't convert any buffer to UTF-8 if the
> origination buffer is not UTF-16".
>
> SDR
>
> On Thu, Sep 13, 2012 at 5:04 PM, David Given <[email protected]> wrote:
>
>> >
>> > On 13/09/12 21:08, Richard Hipp wrote:
>> > [...]
>> > > Basically, we need a routine that converts an...
>>
>> > _______________________________________________
>> > fossil-users mailing list
>> > [email protected]...
>>
>
>
> _______________________________________________
> fossil-users mailing list
> [email protected]
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>
>
> _______________________________________________
> fossil-users mailing list
> [email protected]
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>
>
_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to