Re: [Cuneiform] Patch to extend hOCR output

Jussi Pakkanen Fri, 20 Mar 2009 04:29:02 -0700

On Sun, Feb 22, 2009 at 5:01 PM, Dmitry Polevoy
<[email protected]> wrote:


> The initial version of hOcr output was created by Rene Rebe (look at history
> of  \cuneiform-linux\cuneiform_src\Kern\rout\src\html.cpp) and I am not a
> specialist with html encoding format.

The UTF-8 encoding thing was added by me. The reason it always outputs
UTF-8 is that Unicode is the recommended encoding for HTML and it
covers all the letters so there is no need to add support for legacy
character sets. I guess we could change the html writer function so
that you can't pass output charset information to it. Currently the
only caller is the Cuneiform command line binary, which  always passes
UTF-8 as output format.

_______________________________________________
Mailing list: https://launchpad.net/~cuneiform
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~cuneiform
More help   : https://help.launchpad.net/ListHelp

Re: [Cuneiform] Patch to extend hOCR output

Reply via email to