···<date: 2012-10-01, Monday>···<from: Simo Ojala>··· > On 09/29/2012 02:35 PM, Hans Hagen wrote: > >On 29-9-2012 01:41, Simo Ojala wrote: > >>Hans Hagen <pra...@wxs.nl> > >> > >>On 09/28/2012 11:46 AM, Hans Hagen wrote: > >>>On 27-9-2012 21:27, Simo Ojala wrote: > >>>>This is a problem originally posted in TeX/StackExchange. However, > >>>>since > >>>>I have not had any luck in finding a solution I post it here too. I am > >>>>confident that somebody here should know the answer. > >>>> > >>>> > >>>>http://tex.stackexchange.com/questions/73970/problem-with-context-mkiv-hebrew-and-ligatures > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>"Since I last played with the latest ConTeXt MkIV, there has been > >>>>introduced this new feature. It now seems to combine Hebrew characters > >>>>automatically when possible to ligatures. So for example. If I have a > >>>>word with following two characters: > >>>> > >>>>U+05D5 (HEBREW LETTER VAV) > >>>>U+05BC (HEBREW POINT DAGESH OR MAPIQ) > >>>> > >>>>ConTeXt will combine these to: > >>>> > >>>>U+FB35 (HEBREW LETTER VAV WITH DAGESH) > >>>> > >>>>However, I would need to disable this feature for a number of reasons. > >>>>For example, this breaks my little database query, because the query > >>>>key > >>>>is changed before(?) macro gets it. > >>>> > >>>>So if somebody would know how to turn this off and maybe also that what > >>>>has changed." > >>> > >>>It depends on the font ... normally you can disable this by *not* using > >>>the mark and mkmk features > >>> > >>>Hans > >>> > >> > >>Ok, I have now tried turning off all kinds of features without luck. So, > >>I tried putting together minimal test case. I suspect that there should > >>be done something more than just turn off some font features. However, > >>my ConTeXt skills are very limited so I can be wrong. > >> > >>The goal is that the word passed from ConTeXt file remains as it is > >>written and gives unicode characters U+5e1, U+5d5, U+5bc and U+5e1. This > >>is what already happens when the word is in the lua file. > >> > >>Simo > >> > >>PS: In case this matters. My ConTeXt MkIV version is "2012.09.23 12:40". > >>It should be the latest for Ubuntu 12.04 LTS Precise Pangolin that is in > >>the Adam Reviczky's PPA. > >> > >> > >>%% testcase.tex > >> > >>\definefontfeature[hebrew][arabic][script=hebr] > >>\definefont[dejavusans][name:dejavusans*hebrew at 26pt] > >>\setupdirections[bidi=global] > >> > >>\starttext > >>\dejavusans > >> > >>\def\Macro#1{\directlua{ > >>dofile(resolvers.findfile("testcase.lua")) > >>userdata.testfunction("#1") > >>}} > >> > >>\Macro{סוּס} > >> > >>\blank[1cm]however, we can still color these independently\blank[0.5cm] > >> > >>\color[red]{ס}\color[green]{ו}\color[blue]{ּ}\color[yellow]{ס} > >> > >>\stoptext > >> > >> > >>-- testcase.lua > >> > >>userdata = userdata or {} > >> > >>function userdata.testfunction(word) > >> > >> tex.sprint("\\blank[1cm]word passed by macro\\blank[0.5cm]") > >> > >> for i = 1, unicode.utf8.len(word) do > >> tex.sprint("U+" .. > >>string.format("%x",unicode.utf8.byte(word,i)) .. ": " .. > >>unicode.utf8.sub(word,i,i) .. "\\par" ) > >> end > >> > >> tex.sprint("\\blank[1cm]word written in lua file\\blank[0.5cm]") > >> > >> word = "סוּס" > >> > >> for i = 1, unicode.utf8.len(word) do > >> tex.sprint("U+" .. > >>string.format("%x",unicode.utf8.byte(word,i)) .. ": " .. > >>unicode.utf8.sub(word,i,i) .. "\\par" ) > >> end > >>end > > > >I see three characters next to each other so what exactly is the problem? > > > >(BTW, take a look at goodies-002.tex in the test suite ... you can > >define colored glyphs as a feature) > > > >Hans > > > > Sorry for being unclear, I try to clarify. The problem is: > > 1. I have tex file with which calls a macro with argument that has > characters U+5d5 and U+5bc. > 2. Macro passes argument further to lua code. When it gets there > characters have turned to U+fb35.
Hi, I don’t have clue about hebrew but isn’t this a correct normalization[0], not a ligature? If so, the behavior of Luatex is perfectly fine. Lua otoh treats the string as a sequence of bytes, which is just how it treats strings everywhere. [0] http://www.unicode.org/charts/normalization/chart_Hebrew.html Regards Philipp > 3. When the lua code then compares the U+fb35 with xml file that has > the original forms U+5d5 and U+5bc it of course fails. > > So, the problem is that there is this phase 2 that has not always > happened. If possible I would like to turn it off somehow. Of course > I could try to write some workaround code to countermeasure this > substitution or what it should be called. But that could be > complicated and lead to more problems. > > > Simo > > > PS: I attached my result of the test case in case this is problem > with my setup. Compiled with ConTeXt MkIV 2012.09.25 21:44. > ___________________________________________________________________________________ > If your question is of interest to others as well, please add an entry to the > Wiki! > > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > ___________________________________________________________________________________ -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments
pgpfTcCmWtzS2.pgp
Description: PGP signature
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________