On Fri, Sep 14, 2012 at 1:29 AM, Scott Robison <[email protected]>wrote:
> So I've spent some time writing a small and I think portable routine to > detect if a buffer is a valid UTF-16 (either little or big endian). It > rejects buffers if they contain an odd number of bytes or contain any of > the 66 non-character code-points or have invalid surrogate usage. While > this seems to work well on the handful of binary files I've tested it > against, I'm curious as to whether it would be desired to additionally use > some or all of the current binary file detection criteria? My thought being > that a file could be perfectly valid UTF-16 but have an extremely long line > or (worse in my opinion) embedded non-text characters (particularly U+0000 > or non-white-space control codes). Detection of embedded non-printing characters, especially U+0000, would be nice. Should we insist on a BOM at the beginning of the file? > > SDR > > > On Thu, Sep 13, 2012 at 6:07 PM, Richard Hipp <[email protected]> wrote: > >> You assume correctly. >> >> The use of iconv won't do, though, since everything also needs to work on >> Unix. There are small, portable conversion routines in SQLite that you can >> copy. >> >> D. Richard Hipp - [email protected] >> Sent from phone - pardon brevity >> >> On Sep 13, 2012 7:44 PM, "Scott Robison" <[email protected]> wrote: >> >> I assumed (dangerous though it may be) that "leaves anything that isn't >> UTF-16 unchanged" meant "don't convert any buffer to UTF-8 if the >> origination buffer is not UTF-16". >> >> SDR >> >> On Thu, Sep 13, 2012 at 5:04 PM, David Given <[email protected]> wrote: >> >>> > >>> > On 13/09/12 21:08, Richard Hipp wrote: >>> > [...] >>> > > Basically, we need a routine that converts an... >>> >>> > _______________________________________________ >>> > fossil-users mailing list >>> > [email protected]... >>> >> >> >> _______________________________________________ >> fossil-users mailing list >> [email protected] >> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users >> >> >> _______________________________________________ >> fossil-users mailing list >> [email protected] >> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users >> >> > > _______________________________________________ > fossil-users mailing list > [email protected] > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users > > -- D. Richard Hipp [email protected]
_______________________________________________ fossil-users mailing list [email protected] http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

