Re: [NTG-context] distributed / parallel TeX?

Lars Huttar Tue, 16 Dec 2008 10:25:39 -0800

On 12/16/2008 2:08 AM, Taco Hoekwater wrote:
> 
> Hi Lars,
> 
> Lars Huttar wrote:
>> Hello,
>>
>> We've been using TeX to typeset a 1200-page book, and at that size, the
>> time it takes to run becomes a big issue (especially with multiple
>> passes... about 8 on average). It takes us anywhere from 80 minutes on
>> our fastest machine, to 9 hours on our slowest laptop.
> 
> You should not need an average of 8 runs unless your document is
> ridiculously complex and I am curious what you are doing (but that
> is a different issue from what you are asking).
> 
>> So the question comes up, can TeX runs take advantage of parallelized or
>> distributed processing? 
> 
> No. For the most part, this is because of another requisite: for
> applications to make good use of threads, they have to deal with a
> problem that can be parallelized well. And generally speaking,
> typesetting  does not fall in this category. A seemingly small change
> on page 4 can easily affect each and every page right to the end
> of the document.


Thank you for your response.

Certainly this is true in general and in the worst case, as things stand
currently. But I don't think it has to be that way. The following could
greatly mitigate that problem:

- You could design your document *specifically* to make the parts
independent, so that the true and authoritative way to typeset them is
to typeset the parts independently. (You can do this part now without
modifying TeX at all... you just have the various sections' .tex files
input common "headers" / macro defs.) Then, by definition, a change in
one section cannot affect another section (except for page numbers, and
possibly left/right pages, q.v. below).

- Most large works are divisible into chunks separated by page breaks
and possibly page breaks that force a "recto". This greatly limits the
effects that any section can have on another. The division ("chunking")
of the whole document into fairly-separate parts could either be done
manually, or if there are clear page breaks, automatically.

- The remaining problem, as you noted, is how to fix page references
from one section to another. Currently, TeX resolves forward references
by doing a second (or third, ...) pass, which uses page information from
the previous pass. The same technique could be used for resolving
inter-chunk references and determining on what page each chunk should
start. After one pass on of the independent chunks (ideally performed
simultaneously by separate processing nodes), page information is sent
from each node to a "coordinator" process. E.g. the node that processed
section two tells the coordinator that chapter 11 starts 37 pages after
the beginning of section two. The coordinator knows in what sequence the
chunks are to be concatenated, thanks to a config file. It uses this
information together with info from each of the nodes to build a table
of what page each chunk should start on, and a table giving the absolute
page number of each page reference. If pagination has changed, or is
new, this info is sent back to the various nodes for another round of
processing.

If this distributed method of typesetting a document takes 1 additional
iteration compared to doing it in series, but you get to split the
document into say 5 roughly equal parts, you could presumably get the
job done a lot quicker in spite of the extra iteration.

This is a crude description but hopefully the idea is clear enough.

>> parallel pieces so that you could guarantee that you would get the same
>> result for section B whether or not you were typesetting the whole book
>> at the same time?
> 
> if you are willing to promiss yourself that all chapters will be exactly
> 20 pages - no more, no less - they you can split the work off into
> separate job files yourself and take advantage of a whole server
> farm. If you can't ...

Yes, the splitting can be done manually now, and when the pain point
gets high enough, we do some manual separate TeX runs.

However, I'm thinking that for large works, there is enough gain to be
had that it would be worth systematizing the splitting process and
especially the recombining process, since the later is more error-prone.

I think people would do it a lot more if there were automation support
for it. I know we would.

But then, maybe our situation of having a large book with dual columns
and multipage tables is not common enough in the TeX world.
Maybe others who are typesetting similar books just use commercial
WYSIWYG typesetting tools, as we did in the previous edition of this book.

Lars
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

Re: [NTG-context] distributed / parallel TeX?

Reply via email to