srintuar wrote:
> The knowledge of how
> to detect a null in a stateful encoding is not necessarily trivial.
>
> If there was a function which could return the unit-word-size of
> any encoding accepted by iconv, ...
Here is how to write such a function: Given the unknown encoding,
1. convert "\000" from UTF-8 to the given encoding,
2. convert "\000\000" from UTF-8 to the given encoding,
3. return the difference of the lengths (measured in bytes) of the two
results.
4. If the encoding is UTF-7, this does not work. Here return 1 instead.
The corresponding Clisp code:
(defun encoding-zeroes (encoding)
(let ((name (ext:encoding-charset encoding))
(table #.(make-hash-table :test #'equal
:initial-contents '(("UTF-7" . 1))))
(tester #.(make-string 2 :initial-element (code-char 0))))
(or (gethash name table)
(setf (gethash name table)
(- (length (ext:convert-string-to-bytes tester encoding))
(length (ext:convert-string-to-bytes tester encoding :end 1)))))))
Bruno
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/