srintuar wrote:
> The knowledge of how
> to detect a null in a stateful encoding is not necessarily trivial.
>
> If there was a function which could return the unit-word-size of
> any encoding accepted by iconv, ...

Here is how to write such a function: Given the unknown encoding,
1. convert "\000" from UTF-8 to the given encoding,
2. convert "\000\000" from UTF-8 to the given encoding,
3. return the difference of the lengths (measured in bytes) of the two
   results.
4. If the encoding is UTF-7, this does not work. Here return 1 instead.

The corresponding Clisp code:

(defun encoding-zeroes (encoding)
  (let ((name (ext:encoding-charset encoding)) 
        (table #.(make-hash-table :test #'equal
                                  :initial-contents '(("UTF-7" . 1))))
        (tester #.(make-string 2 :initial-element (code-char 0))))
    (or (gethash name table)
        (setf (gethash name table)
              (- (length (ext:convert-string-to-bytes tester encoding))
                 (length (ext:convert-string-to-bytes tester encoding :end 1)))))))

Bruno


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to