On 7/3/2020 2:55 PM, Marcel Fabian Krüger wrote:
Hi,

I recently noticed some cases where luametatex behaved in unexpected
ways:

   - The "Extra \fi" error isn't triggered, instead an extra `\fi`
     freezes luametatex. (Can be reproduced by compiling a document which
     only consists of a single \fi)

i already fixed here (noticed it when documenting some conditionals)

   - token.new can only create some `data` tokens, but it doesn't apply
     bound checking on it's arguments:

there is no checking yet, there is an upper limit of 0x1FFFFF, so i'll add a check for that

     Also for all other commands LuaTeX seems to apply range-checks to
     ensure that such overflows don't happen, even if invalid values are
     passed as firstargument.

indeed, but hadn't yet done that for data, it also need a more strict check at the tex end (i'm still not sure if i make a slightly different implementation of it but i can add the test anyway)

   - There is token.primitives(). My assumption is that the returned
     table is meant to indicate the command is, mode and name
     corresponding to every primitive. (I think it is awesome that such a
     table is made available in luametatex) But especially the mode
     field sometimes has values which do not correspond to the mode of
     the actual primitives:

indeed.

     I tried running the following in an almost iniTeX setting where all
     primitives aside from \shipout and \Umathcodenum have their default
     definitions:

     ```
     \catcode`\%=12
     \catcode`\~=12
     \directlua{
       local sorted = token.primitives()
       table.sort(sorted, function(a,b) return a[1]<b[1] or a[1]==b[1] and 
a[2]<b[2]end)
       for _,info in ipairs(sorted) do
         local t = token.create(info[3])
         local rc, rm = t.command, t.mode
         if rc==info[1] and rm ~= info[2] then
           if info[2] == 0 then
             print(string.format('MODE MISMATCH, expected zero: \string\\%s: 
real: %i, command: %i', info[3], rm, rc))
           else
             print(string.format('MODE MISMATCH: \string\\%s: offset: %i, 
command: %i', info[3], rm-info[2], rc))
           end
         elseif rc~=info[1] then print(t.csname)
         end
       end
     }
     ```

     This indicates that there are two kinds of differences:
     For some command codes, there are multiple primitives whose second
     entry in the token.primitives table is zero even though their mode
     is not zero. This especially affects the commands `above`,
     `after_something`, `make_box`, `un_vbox`, `set_specification` and
     `car_ret`.
     E.g. for after_something, all of \atendofgrouped, \afterassigned and
     \aftergrouped have a zero as second entry in token.primitives.

some tokens are more complex in the sense that they are combinations (have a follow up) and i'm not sure to what extedn i want to block that ... all a matter of experimenting and time, so

the 'mode' field will be dropped but for now i kept it

some like after_something i need to check (i just didn't update their ranges yet after adding some more primitives that use them) (maybe some otheres need an offset added but i'll check it)


     The other difference is that all the internal_... commands have a
     fixed offset which differes between commands in their mode field.

     IMO the difference for the internal_... commands make sense because
     they make for easier to use numbers, but having multiple primitives
     indicating mode 0 for the other commands seems to make this table
     significantly less useful because it can't be used to get a unique
     description of a primitive.

     (I may have completely misinterpreted the table of course, but given
     that for other primitives the values match I do not think so)
it's a it work in progress as there are some exceptions that use special chr codes (for instance in conditionals several cmd codes need to have exclusive codes, so adapting it is a stepwise process; one decision i need to make there is how close to stay to the original tex codes

eventually i want all to have reasonable ranges in the token interface (not per se the same as in the engine itself but that's a black box anyway) which involves some offsetting .. i do that stepwise in order to keep a working engine (the token interface is not used in context that much)

Hans


hans

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
       tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

Reply via email to