On Mon, Feb 14, 2022 at 03:24:40AM +0100, Thibaut Cuvelier wrote: > On Sun, 13 Feb 2022 at 09:04, Jürgen Spitzmüller <sp...@lyx.org> wrote: > > > Am Sonntag, dem 13.02.2022 um 04:19 +0100 schrieb Thibaut Cuvelier: > > > You mean, with code like > > > > > https://github.com/cburschka/lyx/blob/d3c335a5d524e2edeb73ae1a891fcc58ba5bfd1a/src/BiblioInfo.cpp#L421-L428 > > > for the search? I thought it would be good to have a file to store > > > this information, but I wasn't aware of unicodesymbols. I believe > > > that the file shouldn't even be modified at all, thanks to the > > > presence of the Unicode character number at the beginning of the line > > > (0x00c0 "\\`{A}", whith 0xC0 corresponding to 192, > > > > > https://github.com/cburschka/lyx/blob/master/src/insets/InsetERT.cpp#L131 > > > ). > > > > > > Based on the contents of unicodesymbols, how could I match " \`{A}", > > > "\`A", and "\` A" at once? Should I just use tricks like > > > > > https://github.com/cburschka/lyx/blob/d3c335a5d524e2edeb73ae1a891fcc58ba5bfd1a/src/BiblioInfo.cpp#L414-L418 > > > (which I'm already doing, in a sense, in > > > > > https://github.com/cburschka/lyx/blob/master/src/insets/InsetERT.cpp#L452-L463 > > > )? > > > > I don't know how to do it exactly, but yes, I mean that the information > > you need here should all be in unicodesymbols, or added if not, and > > could be retrieved by the methods defined in Encoding.cpp. > > > > There should be no need to store LaTeX<>Unicode mappings anywhere else. > > > > Thanks, I just did that (with a small test file): a460097823. > > However, this test showed a limitation in the current unicodesymbols: there > can be only one LaTeX command per symbol. This is a limitation in only a > few cases, like LyX Document > \textexclamdown and !`: both of them are mapped to ¡ (i.e. ¡), but the > file only allows for one mapping. > > I would have no problem saying that this is a corner case that can be > easily ignored, but after all I dived into Unicode mapping within ERTs for > DocBook to handle corner cases… (Albeit not in Spanish.) From a > memory-consumption point of view, supporting several commands for one > symbol would require to store more than one string in CharInfo, potentially > even a vector of strings for all entries (even those that have only one > command): that's a 24 bytes overhead ( > https://stackoverflow.com/a/34035291/1066843) for roughly 4000 entries; > that's not so large. > > If we decide to solve this problem, we could have several solutions (all > modifying Encodings::read), I could think of two: > - either use a separator symbol in the latexcommand part of each > unicodesymbols line, but it would be hard to find a single character that > is never used for latexcommands > - or have multiple lines for a single character, with duplicate information > for the second one or a simpler line format for these entries. For > instance, for the inverted exclamation mark: > > 0x00a1 "\\textexclamdown" "" > "force=cp862;cp1255;euc-jp;euc-jp-platex;euc-kr;utf8-platex" # INVERTED > EXCLAMATION MARK > 0x00a1 "!`" # Implicitly, all the other parameters still apply > > What do you think of this? Should this be done? What would be the preferred > solution, if so? (Of course, I offer to do this refactoring :).)
I don't know about any of this, but I just wanted to mention what I think is a related ticket, in case it is relevant for which strategy is taken: https://www.lyx.org/trac/ticket/12475 which is a follow-up to commit 122b452b. Scott
signature.asc
Description: PGP signature
-- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel