Re: Hebrew encoding (cp1255)

Georg Baum Fri, 29 Dec 2006 04:30:55 -0800

On Friday 29 December 2006 00:32, Dov Feldstern wrote:
> So basically, in pre-1.5, the solution is just to use the "default"
> encoding, rather than "auto".
>
> Now we move to 1.5.0: when you try to use "default", you get the
> following message in the stderr:
>
> Unknown inputenc value `default'. Using `auto' instead.
>
> So now there's no way to generate the latex file without the explicit
> encodings, which means that we're stuck with the problem I originally
> described, because of inputenc's limitations.


That is a bug and not intentional. The attached patch fixes the problem (at 
least the LaTeX generation side). With this patch it is possible again to use 
the "default" encoding. Unfortunately the display in LyX is wrong: Everything 
is treated as latin1. I don't know how that works in LyX 1.4.x (the hebrew 
words are displayed with hebrew characters, but not RTL). I'll have a look 
and see whether this can be fixed in 1.5. Meanwhile I am going to put in the 
attached patch.

> One solution would be to see if we could fix this using a newer version
> of inputenc, as Jean-Marc suggested. But perhaps we could solve this by
> again using the "default" encoding option? I realize that in 1.5.0 it's
> harder than in previous versions: now LyX itself has to know what the
> encoding should be, so that it can generate the latex file correctly.

You are right, that is exactly the problem.

> OTOH, it *should* already know that --- it's explicitly writing that
> information to the generated latex file! So all that really needs to be
> done is to *not* write the explicit encoding commands to the generated
> latex file, if the "default" encoding option is chosen.

I'll have a look, see above. The problem is that this "default" encoding does 
not fit ver well into the new unicode world, and I am not yet sure how to 
integrate it better.

> But here's where the second problem arises, and this time it's LyX's
> problem, not latex's (though I'm less sure about this part): it seems to
> me like LyX itself --- not only latex --- is also determining the
> encoding based on the paragraph, rather than based on the individual
> characters' language.

Yes. It is implemented like that because of the limitation of older inputenc 
packages.

> If LyX would perform the conversions on a per-character basis (or
> rather, for consecutive characters with the same encoding), then it
> would at least be able to generate the latex file, and then we'd only be
> left with the first problem.

Yes, we should require a current inputenc version and output each character in 
the encoding that it's language demands.


Georg

Index: src/bufferparams.C
===================================================================
--- src/bufferparams.C	(Revision 16420)
+++ src/bufferparams.C	(Arbeitskopie)
@@ -1468,11 +1468,18 @@
 {
 	if (inputenc == "auto")
 		return *(language->encoding());
-	Encoding const * const enc = encodings.getFromLaTeXName(inputenc);
+	Encoding const * const enc = (inputenc == "default") ?
+		encodings.getFromLyXName("iso8859-1") :
+		encodings.getFromLaTeXName(inputenc);
 	if (enc)
 		return *enc;
-	lyxerr << "Unknown inputenc value `" << inputenc
-	       << "'. Using `auto' instead." << endl;
+	if (inputenc == "default")
+		lyxerr << "Could not find iso8859-1 encoding for inputenc "
+		          "value `default'. Using inputenc `auto' instead."
+		       << endl;
+	else
+		lyxerr << "Unknown inputenc value `" << inputenc
+		       << "'. Using `auto' instead." << endl;
 	return *(language->encoding());
 }
 
Index: src/bufferparams.h
===================================================================
--- src/bufferparams.h	(Revision 16420)
+++ src/bufferparams.h	(Arbeitskopie)
@@ -180,7 +180,10 @@
 	 * The input encoding for LaTeX. This can be one of
 	 * - auto: find out the input encoding from the used languages
 	 * - default: Don't load the inputenc package and hope that it will
-	 *   work (unlikely)
+	 *   work (unlikely). The encoding is an unspecified 8bit encoding,
+	 *   the interpretation is up to the LaTeX compiler. Because we need
+	 *   a rule how to create this from our internal UCS4 encoded
+	 *   document contents we treat this as latin1 internally.
 	 * - any encoding supported by the inputenc package
 	 * The encoding of the LyX file is always utf8 and has nothing to
 	 * do with this setting.
Index: development/FORMAT
===================================================================
--- development/FORMAT	(Revision 16420)
+++ development/FORMAT	(Arbeitskopie)
@@ -78,8 +78,9 @@
 	encoding of the LyX file:
 
 	\inputencoding       LyX file encoding
-	auto                 as determined by the document language
-	default              latin1
+	auto                 as determined by the document language(s)
+	default              unspecified 8bit (treated as latin1 internally,
+	                     see comment in bufferparams.h)
 	everything else      as determined by \inputencoding
 
 2006-07-03  Georg Baum  <[EMAIL PROTECTED]>

Re: Hebrew encoding (cp1255)

Reply via email to