Re: Hebrew encoding (cp1255)

Dov Feldstern Fri, 29 Dec 2006 05:34:34 -0800

Georg Baum wrote:

On Friday 29 December 2006 00:32, Dov Feldstern wrote:
So basically, in pre-1.5, the solution is just to use the "default"
encoding, rather than "auto".

Now we move to 1.5.0: when you try to use "default", you get the
following message in the stderr:

Unknown inputenc value `default'. Using `auto' instead.

So now there's no way to generate the latex file without the explicit
encodings, which means that we're stuck with the problem I originally
described, because of inputenc's limitations.
That is a bug and not intentional. The attached patch fixes the problem (atleast the LaTeX generation side). With this patch it is possible again to usethe "default" encoding. Unfortunately the display in LyX is wrong: Everythingis treated as latin1. I don't know how that works in LyX 1.4.x (the hebrewwords are displayed with hebrew characters, but not RTL). I'll have a lookand see whether this can be fixed in 1.5. Meanwhile I am going to put in theattached patch.

Thanks, the patch works in the sense that it doesn't complain now aboutnot finding the "default" encoding. And the display in the GUI isactually okay (and there's no reason why it should be affected by theencoding --- it depends only on the language, I think). However, thelatex file is still not fully generated, because of the problem with iconv.

One solution would be to see if we could fix this using a newer version
of inputenc, as Jean-Marc suggested. But perhaps we could solve this by
again using the "default" encoding option? I realize that in 1.5.0 it's
harder than in previous versions: now LyX itself has to know what the
encoding should be, so that it can generate the latex file correctly.



You are right, that is exactly the problem.

OTOH, it *should* already know that --- it's explicitly writing that
information to the generated latex file! So all that really needs to be
done is to *not* write the explicit encoding commands to the generated
latex file, if the "default" encoding option is chosen.

I'll have a look, see above. The problem is that this "default" encoding doesnot fit ver well into the new unicode world, and I am not yet sure how tointegrate it better.


Yeah, I agree that I'm not totally clear about what exactly we want, either.

But here's where the second problem arises, and this time it's LyX's
problem, not latex's (though I'm less sure about this part): it seems to
me like LyX itself --- not only latex --- is also determining the
encoding based on the paragraph, rather than based on the individual
characters' language.

Yes. It is implemented like that because of the limitation of older inputencpackages.

There's no real reason why LyX should limit itself just because latexdoes. Here exactly is an example where latex will manage, if only LyX would.

If LyX would perform the conversions on a per-character basis (or
rather, for consecutive characters with the same encoding), then it
would at least be able to generate the latex file, and then we'd only be
left with the first problem.
Yes, we should require a current inputenc version and output each character inthe encoding that it's language demands.

Again, I think that perhaps we could do the second half ("output eachcharacter in the encoding that it's language demands") regardless of thefirst half ("a current inputenc version").

But again, I agree that I'm not totally clear how this will fit in with"real unicode".



Georg


------------------------------------------------------------------------

Index: src/bufferparams.C
===================================================================
--- src/bufferparams.C  (Revision 16420)
+++ src/bufferparams.C  (Arbeitskopie)
@@ -1468,11 +1468,18 @@
 {
        if (inputenc == "auto")
                return *(language->encoding());
-       Encoding const * const enc = encodings.getFromLaTeXName(inputenc);
+       Encoding const * const enc = (inputenc == "default") ?
+               encodings.getFromLyXName("iso8859-1") :
+               encodings.getFromLaTeXName(inputenc);
        if (enc)
                return *enc;
-       lyxerr << "Unknown inputenc value `" << inputenc
-              << "'. Using `auto' instead." << endl;
+       if (inputenc == "default")
+               lyxerr << "Could not find iso8859-1 encoding for inputenc "
+                         "value `default'. Using inputenc `auto' instead."
+                      << endl;
+       else
+               lyxerr << "Unknown inputenc value `" << inputenc
+                      << "'. Using `auto' instead." << endl;
        return *(language->encoding());
 }

Index: src/bufferparams.h

===================================================================
--- src/bufferparams.h  (Revision 16420)
+++ src/bufferparams.h  (Arbeitskopie)
@@ -180,7 +180,10 @@
         * The input encoding for LaTeX. This can be one of
         * - auto: find out the input encoding from the used languages
         * - default: Don't load the inputenc package and hope that it will
-        *   work (unlikely)
+        *   work (unlikely). The encoding is an unspecified 8bit encoding,
+        *   the interpretation is up to the LaTeX compiler. Because we need
+        *   a rule how to create this from our internal UCS4 encoded
+        *   document contents we treat this as latin1 internally.
         * - any encoding supported by the inputenc package
         * The encoding of the LyX file is always utf8 and has nothing to
         * do with this setting.
Index: development/FORMAT
===================================================================
--- development/FORMAT  (Revision 16420)
+++ development/FORMAT  (Arbeitskopie)
@@ -78,8 +78,9 @@
        encoding of the LyX file:

\inputencoding LyX file encoding

-       auto                 as determined by the document language
-       default              latin1
+       auto                 as determined by the document language(s)
+       default              unspecified 8bit (treated as latin1 internally,
+                            see comment in bufferparams.h)
        everything else      as determined by \inputencoding

2006-07-03 Georg Baum <[EMAIL PROTECTED]>

Re: Hebrew encoding (cp1255)

Reply via email to