Re: [NTG-context] Best way to create a large number of documents from database

2020-04-23 Thread Mojca Miklavec
On Fri, 17 Apr 2020 at 21:11, Hans Hagen wrote:
> On 4/17/2020 4:37 PM, Mojca Miklavec wrote:
>
> > One of the interesting statistics.
> > I used a bunch of images (the same png images in all documents; cca.
> > 290k in total).
>
> It can actually make a difference what kind of png image you use. Some
> png images demand a conversion (or split of map etc) to the format
> supported by pdf. Often converting the png to pdf and include those is
> faster.

Thanks for the hint. But I tested it and it hardly makes any difference.
I had to make another batch for the archive (creating a single
document with 4k+ pages), and the full process ran in 10 minutes
(compared to cca. 2,5 hours to create individual documents). Just for
a test run I completely **removed** all the images and it only
accounted for some 10 or 20 seconds speedup. So the biggest overhead
still seems to be in warming up the machinery (which includes my share
of overhead for reading in the 1,3 MB lua table with all data entries)
and Taco's hint of using an external tool for splicing would have
probably scored best :)

I need to add that I'm extremely happy about the resource reuse
(mostly images). As I already mentioned before, individual documents
were 1,5 GB in total, and a badly written software would have created
an equally bad cumulative PDF, while ConTeXt generates a merely 17 MB
file with 4k+ pages. It's really impressive.

Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Best way to create a large number of documents from database

2020-04-17 Thread Hans Hagen

On 4/17/2020 4:37 PM, Mojca Miklavec wrote:


One of the interesting statistics.
I used a bunch of images (the same png images in all documents; cca.
290k in total).


It can actually make a difference what kind of png image you use. Some 
png images demand a conversion (or split of map etc) to the format 
supported by pdf. Often converting the png to pdf and include those is 
faster.

 Hans


-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Best way to create a large number of documents from database

2020-04-17 Thread Mojca Miklavec
On Thu, 16 Apr 2020 at 16:38, Mojca Miklavec wrote:
> On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
> > > On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
> > >
> > > I have been asked to create a few thousand PDF documents from a CSV
> > > "database" today
> >
> > In CPU cycles, the fastest way is to do a single context —once
> > run generating all the pages as a single document, then using
> > mutool merge to split it into separate documents using a (shell)
> > loop.
>
> Just to make it clear: I don't really need to optimize on the CPU end,

... says the optimist ... :) :) :)

> as the bottleneck is on the other side of the keyboard, so as long as
> the CPU can process 5k pages today, I'm fine with it :) :) :)

While the bottleneck was in fact at the other side of the keyboard
(preparation was certainly longer than the execution), it still took
cca 2,5 hours to generate the full batch.

(I'm pretty sure I could have further optimised the code, even though
1 second per run is still pretty fast [when I started using context it
was more like 30 seconds per run], it just adds up when talking about
thousands of pages. This greatly reminds me on the awesome speedup
that Hans achieved when rewriting the mplib code & the initial
\sometxt changes inside metapost which also lead to 100-fold speedups
as one no longer needed to start TeX a zillion times.)

While waiting I wanted to start being clever and do the processing in
the same folder in parallel (I have lots of cores after all), and
ended up calling a script with
context --N={n} --output=doc-{}.pdf template.tex
context --purge
only to notice much later that running multiple context runs in the
same folder (some of them compiling and some of them deleting the
temporary files) might not have been the best idea on the planet, many
documents ended up missing, and many corrupted. So I had to rerun half
of the documents.

One of the interesting statistics.
I used a bunch of images (the same png images in all documents; cca.
290k in total).

The generated documents were 1,5 GB in size. When compressed with
tar.gz, there was almost no noticeable difference between the
compressed and non-compressed data size (1,4 GB vs. 1,5 GB). But when
compressing with tar.xz, it compressed 1,5 GB worth of document into
merely 27 MB (a single document is 360 k).

The documents have been e-mailed out, but now they need to print hard
copies for archive. I'm happy I don't need to be the one printing and
storing that :) :) :)

Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Best way to create a large number of documents from database

2020-04-16 Thread Hans Hagen

On 4/16/2020 8:32 PM, Mojca Miklavec wrote:


Where would be the best way to document this / under what wiki topic,
as I'm sure I'll need it again and forget until then unless I write it
down immediately? "Mail merge"? ;)

maybe a 'workflows' entry?

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Best way to create a large number of documents from database

2020-04-16 Thread Pablo Rodriguez
On 4/16/20 8:32 PM, Mojca Miklavec wrote:
> [...]
> Where would be the best way to document this / under what wiki topic,
> as I'm sure I'll need it again and forget until then unless I write it
> down immediately? "Mail merge"? ;)

Hi Mojca,

“Document merge” could be also fine.

Pablo
--
http://www.ousia.tk
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Best way to create a large number of documents from database

2020-04-16 Thread Mojca Miklavec
On Thu, 16 Apr 2020 at 16:52, Hans Hagen wrote:
> On 4/16/2020 4:38 PM, Mojca Miklavec wrote:
> > On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
> >>> On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
> >>>
> >>> One option is that I quickly draft a python script that creates a few
> >>> thousand TeX documents and compiles them individually, but it might be
> >>> easier if there was a way to just create a single template document
> >>> and then run something like
> >>> context --some-params --N=42 --output=document-0042.pdf template.tex
> >>> or something along those lines.
> >>
> >> If you want to go this route (and you may have to if not each record
> >> fits exactly within a single page),
> >
> > I do have one page per document. The more annoying part is having
> > strange document names that need more attention when mapping page
> > number -> name (I'm not saying this is not doable).
>
> so, don't make files:
>
> - write a tex file foo.tex
> - process it: context --batch --result=1 --once foo
>
> etc ... so, use --result for the target name and use the same input name

This works just perfect, thank you very much.

I now have template.tex and process it with
context --batch --result=doc-0042 --someparam=21a --once template
which generates precisely the desired doc-0042.pdf.

For the moment I'm simply using a combination of
\doifdocumentargument {someparam} {\getdocumentargument{someparam}}
from TeX and
environment.arguments
from within the lua code as suggested by Taco and you in the previous
email thread.

Where would be the best way to document this / under what wiki topic,
as I'm sure I'll need it again and forget until then unless I write it
down immediately? "Mail merge"? ;)

Thank you very much,
Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Best way to create a large number of documents from database

2020-04-16 Thread kaddour kardio
A relatively simple way is to use a templating system such as jinja2 and
iterate over a mkiv template.
Calling context with subprocess and you got the result.

Le jeu. 16 avr. 2020 à 15:52, Hans Hagen  a écrit :

> On 4/16/2020 4:38 PM, Mojca Miklavec wrote:
> > On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
> >>> On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
> >>>
> >>> I have been asked to create a few thousand PDF documents from a CSV
> >>> "database" today
> >>
> >> In CPU cycles, the fastest way is to do a single context —once
> >> run generating all the pages as a single document, then using
> >> mutool merge to split it into separate documents using a (shell)
> >> loop.
> >
> > Just to make it clear: I don't really need to optimize on the CPU end,
> > as the bottleneck is on the other side of the keyboard, so as long as
> > the CPU can process 5k pages today, I'm fine with it :) :) :)
>
> 5K is nothing ... so that will work
>
> >>> One option is that I quickly draft a python script that creates a few
> >>> thousand TeX documents and compiles them individually, but it might be
> >>> easier if there was a way to just create a single template document
> >>> and then run something like
> >>> context --some-params --N=42 --output=document-0042.pdf
> template.tex
> >>> or something along those lines.
> >>
> >> If you want to go this route (and you may have to if not each record
> >> fits exactly within a single page),
> >
> > I do have one page per document. The more annoying part is having
> > strange document names that need more attention when mapping page
> > number -> name (I'm not saying this is not doable).
>
> so, don't make files:
>
> - write a tex file foo.tex
> - process it: context --batch --result=1 --once foo
>
> etc ... so, use --result for the target name and use the same input name
>
> (I won't bother you with the template system in context that no one
> knows of.)
>
>   Hans
>
> -
>Hans Hagen | PRAGMA ADE
>Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
> tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -
>
> ___
> If your question is of interest to others as well, please add an entry to
> the Wiki!
>
> maillist : ntg-context@ntg.nl /
> http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
> archive  : https://bitbucket.org/phg/context-mirror/commits/
> wiki : http://contextgarden.net
>
> ___
>
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Best way to create a large number of documents from database

2020-04-16 Thread Hans Hagen

On 4/16/2020 4:38 PM, Mojca Miklavec wrote:

On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:

On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:

I have been asked to create a few thousand PDF documents from a CSV
"database" today


In CPU cycles, the fastest way is to do a single context —once
run generating all the pages as a single document, then using
mutool merge to split it into separate documents using a (shell)
loop.


Just to make it clear: I don't really need to optimize on the CPU end,
as the bottleneck is on the other side of the keyboard, so as long as
the CPU can process 5k pages today, I'm fine with it :) :) :)


5K is nothing ... so that will work


One option is that I quickly draft a python script that creates a few
thousand TeX documents and compiles them individually, but it might be
easier if there was a way to just create a single template document
and then run something like
context --some-params --N=42 --output=document-0042.pdf template.tex
or something along those lines.


If you want to go this route (and you may have to if not each record
fits exactly within a single page),


I do have one page per document. The more annoying part is having
strange document names that need more attention when mapping page
number -> name (I'm not saying this is not doable).


so, don't make files:

- write a tex file foo.tex
- process it: context --batch --result=1 --once foo

etc ... so, use --result for the target name and use the same input name

(I won't bother you with the template system in context that no one 
knows of.)


 Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Best way to create a large number of documents from database

2020-04-16 Thread Mojca Miklavec
On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
> > On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
> >
> > I have been asked to create a few thousand PDF documents from a CSV
> > "database" today
>
> In CPU cycles, the fastest way is to do a single context —once
> run generating all the pages as a single document, then using
> mutool merge to split it into separate documents using a (shell)
> loop.

Just to make it clear: I don't really need to optimize on the CPU end,
as the bottleneck is on the other side of the keyboard, so as long as
the CPU can process 5k pages today, I'm fine with it :) :) :)

> > One option is that I quickly draft a python script that creates a few
> > thousand TeX documents and compiles them individually, but it might be
> > easier if there was a way to just create a single template document
> > and then run something like
> >context --some-params --N=42 --output=document-0042.pdf template.tex
> > or something along those lines.
>
> If you want to go this route (and you may have to if not each record
> fits exactly within a single page),

I do have one page per document. The more annoying part is having
strange document names that need more attention when mapping page
number -> name (I'm not saying this is not doable).

> browse back a day or so in the mailing
> list archive for Gerben’s question about
>
>   “Using command line values in a TeX document; writing a script?"

Thanks a lot for the pointer. I didn't have that much time to read
through all the emails recently, I only noticed that he was super
actively working on some metapost stuff, I wasn't paying attention to
this.

> The replies offer various options using either lua or tex code
> to get at user-supplied arguments from the commandline.

Let me see what I come up with, I'm stil fiddling with data & layout
at the moment :)

Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


Re: [NTG-context] Best way to create a large number of documents from database

2020-04-16 Thread Taco Hoekwater


> On 16 Apr 2020, at 11:12, Mojca Miklavec  
> wrote:
> 
> Hi,
> 
> I have been asked to create a few thousand PDF documents from a CSV
> "database" today (which I can easily transform into any other form,
> like XML or a lua table or TeX definitions or whatever).
> 
> Generating a few thousand pages would be straightforward, but I'm sure
> there are some clever ways to handle this scenario as well, I'm just
> not aware of them :)

In CPU cycles, the fastest way is to do a single context —once
run generating all the pages as a single document, then using
mutool merge to split it into separate documents using a (shell)
loop.

Starting up mutool is much faster than starting context, even with lmtx.


> One option is that I quickly draft a python script that creates a few
> thousand TeX documents and compiles them individually, but it might be
> easier if there was a way to just create a single template document
> and then run something like
>context --some-params --N=42 --output=document-0042.pdf template.tex
> or something along those lines.

If you want to go this route (and you may have to if not each record
fits exactly within a single page), browse back a day or so in the mailing
list archive for Gerben’s question about 

  “Using command line values in a TeX document; writing a script?"

The replies offer various options using either lua or tex code
to get at user-supplied arguments from the commandline.

Best wishes,
Taco



___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___


[NTG-context] Best way to create a large number of documents from database

2020-04-16 Thread Mojca Miklavec
Hi,

I have been asked to create a few thousand PDF documents from a CSV
"database" today (which I can easily transform into any other form,
like XML or a lua table or TeX definitions or whatever).

Generating a few thousand pages would be straightforward, but I'm sure
there are some clever ways to handle this scenario as well, I'm just
not aware of them :)

One option is that I quickly draft a python script that creates a few
thousand TeX documents and compiles them individually, but it might be
easier if there was a way to just create a single template document
and then run something like
context --some-params --N=42 --output=document-0042.pdf template.tex
or something along those lines.

What's the best approach with the existing functionality? I would be
more than grateful for any hints.

Thank you very much,
Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___