Re: freemarker-generator: Improving the input documents concept

Daniel Dekany Sun, 23 Feb 2020 14:15:52 -0800

What you are describing is more like the angle that FMPP took initially,
where templates drive things, they generate the output for themselves (even
multiple output files if they wish). By default output files name (and
relative path) is deduced from template name. There was also a global
data-model, built in a configuration file (or equally, built via command
line arguments, or both mixed), from which templates get whatever data they
are interested in. Take a look at the figures here:
http://fmpp.sourceforge.net/qtour.html. Later, this concept was generalized
a bit more, because you could add XML files at the same place where you
have the templates, and then you could associate transform templates to the
XML files (based on path pattern and/or the XML document element). Now
that's like what freemarker-generator had initially (data files drive
output, and the template is there to transform it).


So I think the generic mental model would like this:

   1. You got files that drive the process, let's call them *generator
   files* for now. Usually, each generator file yields an output file (but
   maybe even multiple output files, as you might saw in the last figure).
   These generator files can be of many types, like XML, JSON, XLSX (as in the
   original freemarker-generator), and even templates (as is the norm in
   FMPP). If the file is not a template, then you got a set of transformer
   templates (-t CLI option) in a separate directory, which can be associated
   with the generator files base on name patterns, and even based on content
   (schema usually). If the generator file is a template (so that's a
   positional @Parameter CLI argument that happens to be an *.ftl, and is not
   a template file specified after the "-t" option), then you just
   Template.process(...) it, and it prints what the output will be.
   2. You also have a set of variables, the global data-model, that
   contains commonly useful stuff, like what you now call parameters (CLI
   -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those data
   files aren't "generator files". Templates just use them if they need them.
   An important thing here is to reuse the same mechanism to read and parse
   those data files, which was used in templates when transforming generator
   files. So we need a common format for specifying how to load data files.
   That's maybe just FTL that #assigns to the variables, or maybe more
   declarative format.

What I have described in the original post here was a less generic form of
this, as I tried to be true with the original approach. I though the
proposal will be drastic enough as it is... :) There, the "main" document
is the "generator file" from point 1, the "-t" template is the transform
template for the "main" document, and the other named documents ("users",
"groups") is a poor man's shared data-model from point 2 (together with
with -PName=value).

There's further somewhat confusing thing to get right with the
list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though. In
the model above, as per point 1, if you list multiple data files, each will
generate a separate output file. So, if you need take in a list of files to
transform it to a single output file (or at least with a single transform
template execution), then you have to be explicit about that, as that's not
the default behavior anymore. But it's still absolutely possible. Imagine
it as a "list of XLSX-es" is itself like a file format. You need some CLI
(and Maven config, etc.) syntax to express that, but that shouldn't be a
big deal.



On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
[email protected]> wrote:

> Hi Daniel,
>
> Good timing - I was looking at a similar problem from different angle
> yesterday (see below)
>
> Don't have enough time to answer your email in detail now - will do that
> tomorrow evening
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> ===. START
> # FreeMarker CLI Improvement
> ## Support Of Multiple Template Files
> Currently we support the following combinations
>
> * Single template and no data files
> * Single template and one or more data files
>
> But we can not support the following use case which is quite typical in
> the cloud
>
> __Convert multiple templates with a single data file, e.g copying a
> directory of configuration files using a JSON configuration file__
>
> ## Implementation notes
> * When we copy a directory we can remove the `ftl`extension on the fly
> * We might need an `exclude` filter for the copy operation
> * Initially resolve to a list of template files and process one after
> another
> * Need to calculate the output file location and extension
> * We need to rename the existing command line parameters  (see below)
> * Do we need multiple include and exclude filter?
> * Do we need file versus directory filters?
>
> ### Command Line Options
> ```
> --input-encoding : Encoding of the documents
> --output-encoding : Encoding of the rendered template
> --template-encoding : Encoding of the template
> --output : Output file or directory
> --include-document : Include pattern for documents
> --exclude-document : Exclude pattern for documents
> --include-template: Include pattern for templates
> --exclude-template : Exclude pattern for templates
> ```
>
> ### Command Line Examples
> ```text
> # Copy all FTL templates found in "ext/config" to the "/config" directory
> using the data from "config.json"
> > freemarker-cli -t ./ext/config --include-template *.ftl --o /config
> config.json
> > freemarker-cli --template ./ext/config --include-template *.ftl --output
> /config config.json
>
> # Bascically the same using a named document "configuration"
> # It might make sense to expose "conf" directly in the FreeMarker data
> model
> # It might make sens to allow URIs for loading documents
> > freemarker-cli -t ./ext/config/*.ftl -o /config -d
> configuration=config.json
> > freemarker-cli --template ./ext/config --include-template *.ftl --output
> /config --document configuration=config.json
> > freemarker-cli --template ./ext/config --include-template *.ftl --output
> /config --document configuration=file:///config.json
>
> # Bascically the same using an environment variable as named document
> > freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
> configuration=env:///CONFIGURATION
> > freemarker-cli --template ./ext/config --include-template *.ftl --output
> /config --document configuration=env:///CONFIGURATION
> ```
> === END
>
>
> > On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote:
> >
> > Input documents is a fundamental concept in freemarker-generator, so we
> > should think about that more, and probably refine/rework how it's done.
> >
> > Currently it works like this, with CLI at least.
> >
> >    freemarker-cli
> >        -t access-report.ftl
> >        somewhere/foo-access-log.csv
> >
> > Then in access-report.ftl you have to do something like this:
> >
> >    <#assign doc = Documents.get(0)>
> >    ... process doc here
> >
> > (The more idiomatic Documents[0] won't work. Actually, that lead to a
> funny
> > chain of coincidences: It returned the string "D", then
> CSVTool.parse(...)
> > happily parsed that to a table with the single column "D", and 0 rows,
> and
> > as there were 0 rows, the template didn't run into an error because
> > row.myExpectedColumn refers to a missing column either, so the process
> > finished with success. (: Pretty unlucky for sure. The root was
> > unintentionally breaking a FreeMarker idiom though; eventually we will
> have
> > to work on those too, but, different topic.)
> >
> > However, actually multiple input documents can be passed in:
> >
> >    freemarker-cli
> >        -t access-report.ftl
> >        somewhere/foo-access-log.csv
> >        somewhere/bar-access-log.csv
> >
> > Above template will still work, though then you ignored all but the first
> > document. So if you expect any number of input documents, you probably
> will
> > have to do this:
> >
> >    <#list Documents.list as doc>
> >          ... process doc here
> >    </#list>
> >
> > (The more idiomatic <#list Documents as doc> won't work; but again, those
> > we will work out in a different thread.)
> >
> >
> > So, what would be better, in my opinion. I start out from what I think
> are
> > the common uses cases, in decreasing order of frequency. Goal is to make
> > those less error prone for the users, and simpler to express.
> >
> > USE CASE 1
> >
> > You have exactly 1 input documents, which is therefore simply "the"
> > document in the mind of the user. This is probably the typical use case,
> > but at least the use case users typically start out from when starting
> the
> > work.
> >
> >    freemarker-cli
> >        -t access-report.ftl
> >        somewhere/foo-access-log.csv
> >
> > Then `Documents.get(0)` is not very fitting. Most importantly it's error
> > prone, because if the user passed in more than 1 documents (can even
> happen
> > totally accidentally, like if the user was lazy and used a wildcard that
> > the shell exploded), the template will silently ignore the rest of the
> > documents, and the singe document processed will be practically picked
> > randomly. The user might won't notice that and submits a bad report or
> such.
> >
> > I think that in this use case the document should be simply referred as
> > `Document` in the template. When you have multiple documents there,
> > referring to `Document` should be an error, saying that the template was
> > made to process a single document only.
> >
> >
> > USE CASE 2
> >
> > You have multiple input documents, but each has different role (different
> > schema, maybe different file type). Like, you pass in users.csv and
> > groups.csv. Each has difference schema, and so you want to access them
> > differently, but in the same template.
> >
> >    freemarker-cli
> >        [...]
> >        --named-document users somewhere/foo-users.csv
> >        --named-document groups somewhere/foo-groups.csv
> >
> > Then in the template you could refer to them as: `NamedDocuments.users`,
> > and `NamedDocuments.groups`.
> >
> > Use Case 1, and 2 can be unified into a coherent concept, where
> `Document`
> > is just a shorthand for `NamedDocuments.main`. It's called "main" because
> > that's "the" document the template is about, but then you have to added
> > some helper documents, with symbolic names representing their role.
> >
> >    freemarker-cli
> >        -t access-report.ftl
> >        --document-name=main somewhere/foo-access-log.csv
> >        --document-name=users somewhere/foo-users.csv
> >        --document-name=groups somewhere/foo-groups.csv
> >
> > Here, `Document` still works in the template, and it refers to
> > `somewhere/foo-access-log.csv`. (While omitting --document-name=main
> above
> > would be cleaner, I couldn't figure out how to do that with Picocli.
> > Anyway, for now the point is the concept, which is not specific to CLI.)
> >
> >
> > USE CASE 3
> >
> > Here you have several of the same kind of documents. That has a more
> > generic sub-use-case, when you have explicitly named documents (like
> > "users" above), and for some you expect multiple input files.
> >
> >    freemarker-cli
> >        -t access-report.ftl
> >        --document-name=main somewhere/foo-access-log.csv
> > somewhere/bar-access-log.csv
> >        --document-name=users somewhere/foo-users.csv
> > somewhere/bar-users.csv
> >        --document-name=groups somewhere/global-groups.csv
> >
> > The template must to be written with this use case in mind, as now it has
> > #list some of the documents. (I think in practice you hardly ever want to
> > get a document by hard coded index. Either you don't know how many
> > documents you have, so you can't use hard coded indexes, or you do, and
> > each index has a specific meaning, but then you should name the documents
> > instead, as using indexes is error prone, and hard to read.)
> > Accessing that list of documents in the template, maybe could be done
> like
> > this:
> > - For the "main" documents: `DocumentList`
> > - For explicitly named documents, like "users":
> `NamedDocumentLists.users`
> >
> >
> > SUMMING UP
> >
> > To unify all 3 use cases into a coherent concept:
> > - `NamedDocumentLists.<name>` is the most generic form, and while you can
> > achieve everything with it, using it requires your template to handle the
> > most generic case too. So, I think it would be rarely used.
> > - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's
> > used if you only have one kind of documents (single format and schema),
> but
> > potentially multiple of them.
> > - `NamedDocuments.<name>` expresses that you expect exactly 1 document of
> > the given name.
> > - `Document` is just a shorthand for `NamedDocuments.main`. This is for
> the
> > most natural/frequent use case.
> >
> > That's 4 possible ways of accessing your documents, which is a trade-off
> > for the sake of these:
> > - Catching CLI (or Maven, etc.) input where the template output likely
> will
> > be wrong. That's only possible if the user can communicate its intent in
> > the template.
> > - Users don't need to deal with concepts that are irrelevant in their
> > concrete use case. Just start with the trivial, `Document`, and later if
> > the need arises, generalize to named documents, document lists, or both.
> >
> >
> > What do guys think?
>
>

Re: freemarker-generator: Improving the input documents concept

Reply via email to