Re: freemarker-generator: Improving the input documents concept

Siegfried Goeschl Wed, 26 Feb 2020 00:33:45 -0800

Hi folks,

still wrapping my side around but assembled some thoughts here - 
https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449


Thanks in advance, 

Siegfried Goeschl



> On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]> wrote:
> 
> What you are describing is more like the angle that FMPP took initially,
> where templates drive things, they generate the output for themselves (even
> multiple output files if they wish). By default output files name (and
> relative path) is deduced from template name. There was also a global
> data-model, built in a configuration file (or equally, built via command
> line arguments, or both mixed), from which templates get whatever data they
> are interested in. Take a look at the figures here:
> http://fmpp.sourceforge.net/qtour.html. Later, this concept was generalized
> a bit more, because you could add XML files at the same place where you
> have the templates, and then you could associate transform templates to the
> XML files (based on path pattern and/or the XML document element). Now
> that's like what freemarker-generator had initially (data files drive
> output, and the template is there to transform it).
> 
> So I think the generic mental model would like this:
> 
>   1. You got files that drive the process, let's call them *generator
>   files* for now. Usually, each generator file yields an output file (but
>   maybe even multiple output files, as you might saw in the last figure).
>   These generator files can be of many types, like XML, JSON, XLSX (as in the
>   original freemarker-generator), and even templates (as is the norm in
>   FMPP). If the file is not a template, then you got a set of transformer
>   templates (-t CLI option) in a separate directory, which can be associated
>   with the generator files base on name patterns, and even based on content
>   (schema usually). If the generator file is a template (so that's a
>   positional @Parameter CLI argument that happens to be an *.ftl, and is not
>   a template file specified after the "-t" option), then you just
>   Template.process(...) it, and it prints what the output will be.
>   2. You also have a set of variables, the global data-model, that
>   contains commonly useful stuff, like what you now call parameters (CLI
>   -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those data
>   files aren't "generator files". Templates just use them if they need them.
>   An important thing here is to reuse the same mechanism to read and parse
>   those data files, which was used in templates when transforming generator
>   files. So we need a common format for specifying how to load data files.
>   That's maybe just FTL that #assigns to the variables, or maybe more
>   declarative format.
> 
> What I have described in the original post here was a less generic form of
> this, as I tried to be true with the original approach. I though the
> proposal will be drastic enough as it is... :) There, the "main" document
> is the "generator file" from point 1, the "-t" template is the transform
> template for the "main" document, and the other named documents ("users",
> "groups") is a poor man's shared data-model from point 2 (together with
> with -PName=value).
> 
> There's further somewhat confusing thing to get right with the
> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though. In
> the model above, as per point 1, if you list multiple data files, each will
> generate a separate output file. So, if you need take in a list of files to
> transform it to a single output file (or at least with a single transform
> template execution), then you have to be explicit about that, as that's not
> the default behavior anymore. But it's still absolutely possible. Imagine
> it as a "list of XLSX-es" is itself like a file format. You need some CLI
> (and Maven config, etc.) syntax to express that, but that shouldn't be a
> big deal.
> 
> 
> 
> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> [email protected]> wrote:
> 
>> Hi Daniel,
>> 
>> Good timing - I was looking at a similar problem from different angle
>> yesterday (see below)
>> 
>> Don't have enough time to answer your email in detail now - will do that
>> tomorrow evening
>> 
>> Thanks in advance,
>> 
>> Siegfried Goeschl
>> 
>> 
>> ===. START
>> # FreeMarker CLI Improvement
>> ## Support Of Multiple Template Files
>> Currently we support the following combinations
>> 
>> * Single template and no data files
>> * Single template and one or more data files
>> 
>> But we can not support the following use case which is quite typical in
>> the cloud
>> 
>> __Convert multiple templates with a single data file, e.g copying a
>> directory of configuration files using a JSON configuration file__
>> 
>> ## Implementation notes
>> * When we copy a directory we can remove the `ftl`extension on the fly
>> * We might need an `exclude` filter for the copy operation
>> * Initially resolve to a list of template files and process one after
>> another
>> * Need to calculate the output file location and extension
>> * We need to rename the existing command line parameters  (see below)
>> * Do we need multiple include and exclude filter?
>> * Do we need file versus directory filters?
>> 
>> ### Command Line Options
>> ```
>> --input-encoding : Encoding of the documents
>> --output-encoding : Encoding of the rendered template
>> --template-encoding : Encoding of the template
>> --output : Output file or directory
>> --include-document : Include pattern for documents
>> --exclude-document : Exclude pattern for documents
>> --include-template: Include pattern for templates
>> --exclude-template : Exclude pattern for templates
>> ```
>> 
>> ### Command Line Examples
>> ```text
>> # Copy all FTL templates found in "ext/config" to the "/config" directory
>> using the data from "config.json"
>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>> config.json
>>> freemarker-cli --template ./ext/config --include-template *.ftl --output
>> /config config.json
>> 
>> # Bascically the same using a named document "configuration"
>> # It might make sense to expose "conf" directly in the FreeMarker data
>> model
>> # It might make sens to allow URIs for loading documents
>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>> configuration=config.json
>>> freemarker-cli --template ./ext/config --include-template *.ftl --output
>> /config --document configuration=config.json
>>> freemarker-cli --template ./ext/config --include-template *.ftl --output
>> /config --document configuration=file:///config.json
>> 
>> # Bascically the same using an environment variable as named document
>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>> configuration=env:///CONFIGURATION
>>> freemarker-cli --template ./ext/config --include-template *.ftl --output
>> /config --document configuration=env:///CONFIGURATION
>> ```
>> === END
>> 
>> 
>>> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote:
>>> 
>>> Input documents is a fundamental concept in freemarker-generator, so we
>>> should think about that more, and probably refine/rework how it's done.
>>> 
>>> Currently it works like this, with CLI at least.
>>> 
>>>   freemarker-cli
>>>       -t access-report.ftl
>>>       somewhere/foo-access-log.csv
>>> 
>>> Then in access-report.ftl you have to do something like this:
>>> 
>>>   <#assign doc = Documents.get(0)>
>>>   ... process doc here
>>> 
>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>> funny
>>> chain of coincidences: It returned the string "D", then
>> CSVTool.parse(...)
>>> happily parsed that to a table with the single column "D", and 0 rows,
>> and
>>> as there were 0 rows, the template didn't run into an error because
>>> row.myExpectedColumn refers to a missing column either, so the process
>>> finished with success. (: Pretty unlucky for sure. The root was
>>> unintentionally breaking a FreeMarker idiom though; eventually we will
>> have
>>> to work on those too, but, different topic.)
>>> 
>>> However, actually multiple input documents can be passed in:
>>> 
>>>   freemarker-cli
>>>       -t access-report.ftl
>>>       somewhere/foo-access-log.csv
>>>       somewhere/bar-access-log.csv
>>> 
>>> Above template will still work, though then you ignored all but the first
>>> document. So if you expect any number of input documents, you probably
>> will
>>> have to do this:
>>> 
>>>   <#list Documents.list as doc>
>>>         ... process doc here
>>>   </#list>
>>> 
>>> (The more idiomatic <#list Documents as doc> won't work; but again, those
>>> we will work out in a different thread.)
>>> 
>>> 
>>> So, what would be better, in my opinion. I start out from what I think
>> are
>>> the common uses cases, in decreasing order of frequency. Goal is to make
>>> those less error prone for the users, and simpler to express.
>>> 
>>> USE CASE 1
>>> 
>>> You have exactly 1 input documents, which is therefore simply "the"
>>> document in the mind of the user. This is probably the typical use case,
>>> but at least the use case users typically start out from when starting
>> the
>>> work.
>>> 
>>>   freemarker-cli
>>>       -t access-report.ftl
>>>       somewhere/foo-access-log.csv
>>> 
>>> Then `Documents.get(0)` is not very fitting. Most importantly it's error
>>> prone, because if the user passed in more than 1 documents (can even
>> happen
>>> totally accidentally, like if the user was lazy and used a wildcard that
>>> the shell exploded), the template will silently ignore the rest of the
>>> documents, and the singe document processed will be practically picked
>>> randomly. The user might won't notice that and submits a bad report or
>> such.
>>> 
>>> I think that in this use case the document should be simply referred as
>>> `Document` in the template. When you have multiple documents there,
>>> referring to `Document` should be an error, saying that the template was
>>> made to process a single document only.
>>> 
>>> 
>>> USE CASE 2
>>> 
>>> You have multiple input documents, but each has different role (different
>>> schema, maybe different file type). Like, you pass in users.csv and
>>> groups.csv. Each has difference schema, and so you want to access them
>>> differently, but in the same template.
>>> 
>>>   freemarker-cli
>>>       [...]
>>>       --named-document users somewhere/foo-users.csv
>>>       --named-document groups somewhere/foo-groups.csv
>>> 
>>> Then in the template you could refer to them as: `NamedDocuments.users`,
>>> and `NamedDocuments.groups`.
>>> 
>>> Use Case 1, and 2 can be unified into a coherent concept, where
>> `Document`
>>> is just a shorthand for `NamedDocuments.main`. It's called "main" because
>>> that's "the" document the template is about, but then you have to added
>>> some helper documents, with symbolic names representing their role.
>>> 
>>>   freemarker-cli
>>>       -t access-report.ftl
>>>       --document-name=main somewhere/foo-access-log.csv
>>>       --document-name=users somewhere/foo-users.csv
>>>       --document-name=groups somewhere/foo-groups.csv
>>> 
>>> Here, `Document` still works in the template, and it refers to
>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>> above
>>> would be cleaner, I couldn't figure out how to do that with Picocli.
>>> Anyway, for now the point is the concept, which is not specific to CLI.)
>>> 
>>> 
>>> USE CASE 3
>>> 
>>> Here you have several of the same kind of documents. That has a more
>>> generic sub-use-case, when you have explicitly named documents (like
>>> "users" above), and for some you expect multiple input files.
>>> 
>>>   freemarker-cli
>>>       -t access-report.ftl
>>>       --document-name=main somewhere/foo-access-log.csv
>>> somewhere/bar-access-log.csv
>>>       --document-name=users somewhere/foo-users.csv
>>> somewhere/bar-users.csv
>>>       --document-name=groups somewhere/global-groups.csv
>>> 
>>> The template must to be written with this use case in mind, as now it has
>>> #list some of the documents. (I think in practice you hardly ever want to
>>> get a document by hard coded index. Either you don't know how many
>>> documents you have, so you can't use hard coded indexes, or you do, and
>>> each index has a specific meaning, but then you should name the documents
>>> instead, as using indexes is error prone, and hard to read.)
>>> Accessing that list of documents in the template, maybe could be done
>> like
>>> this:
>>> - For the "main" documents: `DocumentList`
>>> - For explicitly named documents, like "users":
>> `NamedDocumentLists.users`
>>> 
>>> 
>>> SUMMING UP
>>> 
>>> To unify all 3 use cases into a coherent concept:
>>> - `NamedDocumentLists.<name>` is the most generic form, and while you can
>>> achieve everything with it, using it requires your template to handle the
>>> most generic case too. So, I think it would be rarely used.
>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's
>>> used if you only have one kind of documents (single format and schema),
>> but
>>> potentially multiple of them.
>>> - `NamedDocuments.<name>` expresses that you expect exactly 1 document of
>>> the given name.
>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is for
>> the
>>> most natural/frequent use case.
>>> 
>>> That's 4 possible ways of accessing your documents, which is a trade-off
>>> for the sake of these:
>>> - Catching CLI (or Maven, etc.) input where the template output likely
>> will
>>> be wrong. That's only possible if the user can communicate its intent in
>>> the template.
>>> - Users don't need to deal with concepts that are irrelevant in their
>>> concrete use case. Just start with the trivial, `Document`, and later if
>>> the need arises, generalize to named documents, document lists, or both.
>>> 
>>> 
>>> What do guys think?
>> 
>>

Re: freemarker-generator: Improving the input documents concept

Reply via email to