On Fri, Sep 14, 2012 at 1:29 AM, Scott Robison <[email protected]>wrote:

> So I've spent some time writing a small and I think portable routine to
> detect if a buffer is a valid UTF-16 (either little or big endian). It
> rejects buffers if they contain an odd number of bytes or contain any of
> the 66 non-character code-points or have invalid surrogate usage. While
> this seems to work well on the handful of binary files I've tested it
> against, I'm curious as to whether it would be desired to additionally use
> some or all of the current binary file detection criteria? My thought being
> that a file could be perfectly valid UTF-16 but have an extremely long line
> or (worse in my opinion) embedded non-text characters (particularly U+0000
> or non-white-space control codes).


Detection of embedded non-printing characters, especially U+0000, would be
nice.

Should we insist on a BOM at the beginning of the file?


>
> SDR
>
>
> On Thu, Sep 13, 2012 at 6:07 PM, Richard Hipp <[email protected]> wrote:
>
>> You assume correctly.
>>
>> The use of iconv won't do, though, since everything also needs to work on
>> Unix.  There are small, portable conversion routines in SQLite that you can
>> copy.
>>
>> D. Richard Hipp - [email protected]
>> Sent from phone - pardon brevity
>>
>> On Sep 13, 2012 7:44 PM, "Scott Robison" <[email protected]> wrote:
>>
>> I assumed (dangerous though it may be) that "leaves anything that isn't
>> UTF-16 unchanged" meant "don't convert any buffer to UTF-8 if the
>> origination buffer is not UTF-16".
>>
>> SDR
>>
>> On Thu, Sep 13, 2012 at 5:04 PM, David Given <[email protected]> wrote:
>>
>>> >
>>> > On 13/09/12 21:08, Richard Hipp wrote:
>>> > [...]
>>> > > Basically, we need a routine that converts an...
>>>
>>> > _______________________________________________
>>> > fossil-users mailing list
>>> > [email protected]...
>>>
>>
>>
>> _______________________________________________
>> fossil-users mailing list
>> [email protected]
>> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>>
>>
>> _______________________________________________
>> fossil-users mailing list
>> [email protected]
>> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>>
>>
>
> _______________________________________________
> fossil-users mailing list
> [email protected]
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>
>


-- 
D. Richard Hipp
[email protected]
_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to