On 6/22/2023 3:14 PM, Axel Kittenberger wrote:
About the lualatex speed thing, the profile for that one is:

  30.84      3.47     3.47   203674     0.00     0.00  luaV_execute
  10.13      4.61     1.14  3289616     0.00     0.00  longest_match
   4.71      5.14     0.53 126838656     0.00     0.00  luaH_get
   4.40      5.64     0.50 77898671     0.00     0.00  luaH_newkey
   4.22      6.11     0.48 138089097     0.00     0.00  luaD_precall
   3.78      6.54     0.43 138025730     0.00     0.00  luaD_poscall
   2.62      6.83     0.30 147056637     0.00     0.00  lua_tointegerx
   1.96      7.05     0.22 109358093     0.00     0.00  mainposition
   1.69      7.24     0.19  8846388     0.00     0.00  copy_node
   1.33      7.39     0.15  1300730     0.00     0.00  luaH_resize
   1.24      7.53     0.14 35276305     0.00     0.00
  lua_nodelib_direct_is_char
   1.16      7.66     0.13 33656112     0.00     0.00  get_next
   1.02      7.78     0.12 59245176     0.00     0.00  luaV_equalobj
   0.98      7.89     0.11  1300730     0.00     0.00  auxsetnode
   0.89      7.99     0.10 14097849     0.00     0.00  delete_attribute_ref
   0.89      8.09     0.10     8526     0.00     0.00  inflate_fast
   0.84      8.18     0.10 152030358     0.00     0.00  lua_pushinteger
   0.84      8.28     0.10 33632692     0.00     0.00
  lua_nodelib_direct_getnext
   0.80      8.37     0.09 65148440     0.00     0.00  luaO_ceillog2
   0.80      8.46     0.09    43053     0.00     0.00  fix_node_list

So it's almost all within the Lua VM (and thus this profile not much more
useful), also I my CNF setup for the self compiled binary is a bit amiss it
keeps rebuilding/reloading lmroman10-regular, but in contrast to the total
runtime not so much.

To get a more useful info what is going I modified the input to use a Lua
profiler* that uses the debug hooks.
-----
\directlua{
   profile = dofile('profile.lua')
   profile.start()
}
\documentclass{article}
\input{plipsum}
\begin{document}
\newcount\ii
\ii=2
\loop
   \lipsum{1-100}
   \advance\ii-1
\ifnum \ii>0
\repeat
\directlua{
   profile.stop()
   print(profile.report(20))
}
\end{document}
----
* https://github.com/2dengine/profile.lua

Contrary to nowadays standard profilers it is not a sampling profiler but
call tracking. Also the printout is not self times but total times.
Since the lua profiler slows down the runtime considerably, I had to reduce
the document size to get reasonable runtime.

I made two runs with double the size the second, to differentiate which is
offset and which runtime document length dependent.


+-----+-------------------------------+-------------+--------------------------+------------------------------------------+

  | #   | Function                      | Calls       | Time
     | Code                             |
  
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+

  | 1   | ?                             | 287         | 9.953764
     | xmf-dist/tex/latex/base/ltluatex.lua:459 |
  | 2   | func                          | 287         | 9.9488
     | uatex/luaotfload/luaotfload-init.lua:430 |
  | 3   | handler                       | 287         | 9.616892
     | aotfload/fontloader-2022-10-03.lua:36043 |
  | 4   | nodepass                      | 287         | 8.861947
     | aotfload/fontloader-2022-10-03.lua:35858 |
  | 5   | ?                             | 221         | 6.412068
     | aotfload/fontloader-2022-10-03.lua:25567 |
  | 6   | ?                             | 1           | 2.2355
     | ex/luaotfload/luaotfload-loaders.lua:179 |
  | 7   | ltx_read                      | 1           | 2.235285
     | ex/luaotfload/luaotfload-loaders.lua:133 |
  | 8   | resolve                       | 1           | 2.130507
     | aotfload/fontloader-2022-10-03.lua:32156 |
  | 9   | r                             | 1           | 2.130164
     | /luaotfload/luaotfload-resolvers.lua:246 |
  | 10  | ?                             | 1           | 2.130115
     | x/luaotfload/luaotfload-resolvers.lua:71 |
  | 11  | ?                             | 1           | 2.106421
     | /luaotfload/luaotfload-database.lua:1292 |
  | 12  | update_names                  | 1           | 2.106258
     | /luaotfload/luaotfload-database.lua:3360 |
  | 13  | handler                       | 221         | 1.068138
     | tex/luaotfload/luaotfload-notdef.lua:189 |
  | 14  | collect_font_filenames        | 1           | 1.034945
     | /luaotfload/luaotfload-database.lua:3114 |
  | 15  | collect_font_filenames_dir    | 1003        | 0.984343
     | /luaotfload/luaotfload-database.lua:2290 |
  | 16  | ?                             | 200         | 0.743775
     | aotfload/fontloader-2022-10-03.lua:21016 |
  | 17  | handler                       | 43650       | 0.70179799999995
     | aotfload/fontloader-2022-10-03.lua:23197 |
  | 18  | retrieve_namedata             | 1           | 0.667004
     | /luaotfload/luaotfload-database.lua:2463 |
  | 19  | collect_font_filenames_texmf  | 1           | 0.663371
     | /luaotfload/luaotfload-database.lua:2351 |
  | 20  | kernrun                       | 5612        | 0.63967499999993
     | aotfload/fontloader-2022-10-03.lua:24976 |
  
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+


  
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+

  | #   | Function                      | Calls       | Time
     | Code                             |
  
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+

  | 1   | ?                             | 575         | 18.783564
      | xmf-dist/tex/latex/base/ltluatex.lua:459 |
  | 2   | func                          | 575         | 18.773059
      | uatex/luaotfload/luaotfload-init.lua:430 |
  | 3   | handler                       | 575         | 18.119982
      | aotfload/fontloader-2022-10-03.lua:36043 |
  | 4   | nodepass                      | 575         | 16.760892
      | aotfload/fontloader-2022-10-03.lua:35858 |
  | 5   | ?                             | 443         | 12.157763
      | aotfload/fontloader-2022-10-03.lua:25567 |
  | 6   | ?                             | 1           | 2.215777
     | ex/luaotfload/luaotfload-loaders.lua:179 |
  | 7   | ltx_read                      | 1           | 2.215572
     | ex/luaotfload/luaotfload-loaders.lua:133 |
  | 8   | resolve                       | 1           | 2.116588
     | aotfload/fontloader-2022-10-03.lua:32156 |
  | 9   | r                             | 1           | 2.116285
     | /luaotfload/luaotfload-resolvers.lua:246 |
  | 10  | ?                             | 1           | 2.116237
     | x/luaotfload/luaotfload-resolvers.lua:71 |
  | 11  | ?                             | 1           | 2.089522
     | /luaotfload/luaotfload-database.lua:1292 |
  | 12  | update_names                  | 1           | 2.089362
     | /luaotfload/luaotfload-database.lua:3360 |
  | 13  | handler                       | 443         | 1.969348
     | tex/luaotfload/luaotfload-notdef.lua:189 |
  | 14  | handler                       | 87300       | 1.365021
     | aotfload/fontloader-2022-10-03.lua:23197 |
  | 15  | ?                             | 400         | 1.335748
     | aotfload/fontloader-2022-10-03.lua:21016 |
  | 16  | kernrun                       | 11224       | 1.1705539999999
      | aotfload/fontloader-2022-10-03.lua:24976 |
  | 17  | testrun                       | 33672       | 1.0861739999999
      | aotfload/fontloader-2022-10-03.lua:25100 |
  | 18  | ?                             | 34115       | 1.0375240000002
      | tex/luaotfload/luaotfload-notdef.lua:311 |
  | 19  | collect_font_filenames        | 1           | 0.981696
     | /luaotfload/luaotfload-database.lua:3114 |
  | 20  | collect_font_filenames_dir    | 1003        | 0.931817
     | /luaotfload/luaotfload-database.lua:2290 |
  
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+


This time the profile actually is very lopsided where almost all runtime
falls into fontloader-2022-10-03.lua. Which IMO should be good news for
possibilities of optimization.

And here the majority into this function:
  function otf.featuresprocessor
but honestly I don't even understand what this function is supposed to do.
Only maybe from the name, this is something that is not cacheable? The
number of calls doubles with double the document size..
When I run 5000 times tufte.tex stored in a macro we get some 1110 pages in default context layout.

   \tufte\endgraf

This gives:

pdftex     :  4.3 seconds (8 bit fonts)
luatex     : 12.3 seconds
luametatex : 10.9 seconds
xetex      : 23.8 seconds

So, with the lua related engines we are some 2.5 slower than pdftex. Given that we have a 32 bit engine and use opentype fonts this is not that bad, also given that we have plenty more flexibility (also with fonts).

modern     : 10.9 seconds
dejavu     : 11.3 seconds (more feature processing)
ebgaramond : 12.5 seconds (inefficient font)
pagella    : 14.8 seconds (many kerns)
lucida     : 10.0 seconds (no kerns)

So, there is a dependency on fonts. However we're talking novels here: pure text, and 0.011 seconds per page is acceptable (less on a faster machine). (Running the context profiler on that sample takes 139.3 seconds but that's irrelevant here I guess.)

When you process pure text you mostly measure font processing (par building can be neglected), hyphenation, backend pdf generation. So you you use little code which is probably is good for cache hits. Also, once a sample text is stored in a macro little work has to be done on the input.

More realistic examples will shift (at least for context) performance to luatex (and fwiw luametatex). I occasionaly run a test that goes from simple to more complex that make run luametatex faster than pdftex. So, don't stare too much at these numbers.

Over the last few years I managed to speed up quite some parts of the engine but measuring significant gain in some specific area by running millions of iterations becomes less impressive once the real number of expansions (etc) is in the tens of thousands or less. Say that luametatex is some 50% faster than luatex (of which we loose 10-15% on the more advanced and demanding backend). That is nice but inefficient macro code (could come from anywhere) quickly makes that gain go away because it touches those improved areas less than some core expansion stuff.

So, to come back to your 8 times slower luatex than pdftex ... if context on simple files is < 2.5 times slower and often less, then you need to look at the macros (or lua code if used) and not at the engine. There's little to gain there.

(You can try to run with luajiitex which has a faster virtual machine.)

Hans

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
       tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------

Reply via email to