Hi folks, still wrapping my side around but assembled some thoughts here - https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
Thanks in advance, Siegfried Goeschl > On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]> wrote: > > What you are describing is more like the angle that FMPP took initially, > where templates drive things, they generate the output for themselves (even > multiple output files if they wish). By default output files name (and > relative path) is deduced from template name. There was also a global > data-model, built in a configuration file (or equally, built via command > line arguments, or both mixed), from which templates get whatever data they > are interested in. Take a look at the figures here: > http://fmpp.sourceforge.net/qtour.html. Later, this concept was generalized > a bit more, because you could add XML files at the same place where you > have the templates, and then you could associate transform templates to the > XML files (based on path pattern and/or the XML document element). Now > that's like what freemarker-generator had initially (data files drive > output, and the template is there to transform it). > > So I think the generic mental model would like this: > > 1. You got files that drive the process, let's call them *generator > files* for now. Usually, each generator file yields an output file (but > maybe even multiple output files, as you might saw in the last figure). > These generator files can be of many types, like XML, JSON, XLSX (as in the > original freemarker-generator), and even templates (as is the norm in > FMPP). If the file is not a template, then you got a set of transformer > templates (-t CLI option) in a separate directory, which can be associated > with the generator files base on name patterns, and even based on content > (schema usually). If the generator file is a template (so that's a > positional @Parameter CLI argument that happens to be an *.ftl, and is not > a template file specified after the "-t" option), then you just > Template.process(...) it, and it prints what the output will be. > 2. You also have a set of variables, the global data-model, that > contains commonly useful stuff, like what you now call parameters (CLI > -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those data > files aren't "generator files". Templates just use them if they need them. > An important thing here is to reuse the same mechanism to read and parse > those data files, which was used in templates when transforming generator > files. So we need a common format for specifying how to load data files. > That's maybe just FTL that #assigns to the variables, or maybe more > declarative format. > > What I have described in the original post here was a less generic form of > this, as I tried to be true with the original approach. I though the > proposal will be drastic enough as it is... :) There, the "main" document > is the "generator file" from point 1, the "-t" template is the transform > template for the "main" document, and the other named documents ("users", > "groups") is a poor man's shared data-model from point 2 (together with > with -PName=value). > > There's further somewhat confusing thing to get right with the > list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though. In > the model above, as per point 1, if you list multiple data files, each will > generate a separate output file. So, if you need take in a list of files to > transform it to a single output file (or at least with a single transform > template execution), then you have to be explicit about that, as that's not > the default behavior anymore. But it's still absolutely possible. Imagine > it as a "list of XLSX-es" is itself like a file format. You need some CLI > (and Maven config, etc.) syntax to express that, but that shouldn't be a > big deal. > > > > On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl < > [email protected]> wrote: > >> Hi Daniel, >> >> Good timing - I was looking at a similar problem from different angle >> yesterday (see below) >> >> Don't have enough time to answer your email in detail now - will do that >> tomorrow evening >> >> Thanks in advance, >> >> Siegfried Goeschl >> >> >> ===. START >> # FreeMarker CLI Improvement >> ## Support Of Multiple Template Files >> Currently we support the following combinations >> >> * Single template and no data files >> * Single template and one or more data files >> >> But we can not support the following use case which is quite typical in >> the cloud >> >> __Convert multiple templates with a single data file, e.g copying a >> directory of configuration files using a JSON configuration file__ >> >> ## Implementation notes >> * When we copy a directory we can remove the `ftl`extension on the fly >> * We might need an `exclude` filter for the copy operation >> * Initially resolve to a list of template files and process one after >> another >> * Need to calculate the output file location and extension >> * We need to rename the existing command line parameters (see below) >> * Do we need multiple include and exclude filter? >> * Do we need file versus directory filters? >> >> ### Command Line Options >> ``` >> --input-encoding : Encoding of the documents >> --output-encoding : Encoding of the rendered template >> --template-encoding : Encoding of the template >> --output : Output file or directory >> --include-document : Include pattern for documents >> --exclude-document : Exclude pattern for documents >> --include-template: Include pattern for templates >> --exclude-template : Exclude pattern for templates >> ``` >> >> ### Command Line Examples >> ```text >> # Copy all FTL templates found in "ext/config" to the "/config" directory >> using the data from "config.json" >>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config >> config.json >>> freemarker-cli --template ./ext/config --include-template *.ftl --output >> /config config.json >> >> # Bascically the same using a named document "configuration" >> # It might make sense to expose "conf" directly in the FreeMarker data >> model >> # It might make sens to allow URIs for loading documents >>> freemarker-cli -t ./ext/config/*.ftl -o /config -d >> configuration=config.json >>> freemarker-cli --template ./ext/config --include-template *.ftl --output >> /config --document configuration=config.json >>> freemarker-cli --template ./ext/config --include-template *.ftl --output >> /config --document configuration=file:///config.json >> >> # Bascically the same using an environment variable as named document >>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d >> configuration=env:///CONFIGURATION >>> freemarker-cli --template ./ext/config --include-template *.ftl --output >> /config --document configuration=env:///CONFIGURATION >> ``` >> === END >> >> >>> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote: >>> >>> Input documents is a fundamental concept in freemarker-generator, so we >>> should think about that more, and probably refine/rework how it's done. >>> >>> Currently it works like this, with CLI at least. >>> >>> freemarker-cli >>> -t access-report.ftl >>> somewhere/foo-access-log.csv >>> >>> Then in access-report.ftl you have to do something like this: >>> >>> <#assign doc = Documents.get(0)> >>> ... process doc here >>> >>> (The more idiomatic Documents[0] won't work. Actually, that lead to a >> funny >>> chain of coincidences: It returned the string "D", then >> CSVTool.parse(...) >>> happily parsed that to a table with the single column "D", and 0 rows, >> and >>> as there were 0 rows, the template didn't run into an error because >>> row.myExpectedColumn refers to a missing column either, so the process >>> finished with success. (: Pretty unlucky for sure. The root was >>> unintentionally breaking a FreeMarker idiom though; eventually we will >> have >>> to work on those too, but, different topic.) >>> >>> However, actually multiple input documents can be passed in: >>> >>> freemarker-cli >>> -t access-report.ftl >>> somewhere/foo-access-log.csv >>> somewhere/bar-access-log.csv >>> >>> Above template will still work, though then you ignored all but the first >>> document. So if you expect any number of input documents, you probably >> will >>> have to do this: >>> >>> <#list Documents.list as doc> >>> ... process doc here >>> </#list> >>> >>> (The more idiomatic <#list Documents as doc> won't work; but again, those >>> we will work out in a different thread.) >>> >>> >>> So, what would be better, in my opinion. I start out from what I think >> are >>> the common uses cases, in decreasing order of frequency. Goal is to make >>> those less error prone for the users, and simpler to express. >>> >>> USE CASE 1 >>> >>> You have exactly 1 input documents, which is therefore simply "the" >>> document in the mind of the user. This is probably the typical use case, >>> but at least the use case users typically start out from when starting >> the >>> work. >>> >>> freemarker-cli >>> -t access-report.ftl >>> somewhere/foo-access-log.csv >>> >>> Then `Documents.get(0)` is not very fitting. Most importantly it's error >>> prone, because if the user passed in more than 1 documents (can even >> happen >>> totally accidentally, like if the user was lazy and used a wildcard that >>> the shell exploded), the template will silently ignore the rest of the >>> documents, and the singe document processed will be practically picked >>> randomly. The user might won't notice that and submits a bad report or >> such. >>> >>> I think that in this use case the document should be simply referred as >>> `Document` in the template. When you have multiple documents there, >>> referring to `Document` should be an error, saying that the template was >>> made to process a single document only. >>> >>> >>> USE CASE 2 >>> >>> You have multiple input documents, but each has different role (different >>> schema, maybe different file type). Like, you pass in users.csv and >>> groups.csv. Each has difference schema, and so you want to access them >>> differently, but in the same template. >>> >>> freemarker-cli >>> [...] >>> --named-document users somewhere/foo-users.csv >>> --named-document groups somewhere/foo-groups.csv >>> >>> Then in the template you could refer to them as: `NamedDocuments.users`, >>> and `NamedDocuments.groups`. >>> >>> Use Case 1, and 2 can be unified into a coherent concept, where >> `Document` >>> is just a shorthand for `NamedDocuments.main`. It's called "main" because >>> that's "the" document the template is about, but then you have to added >>> some helper documents, with symbolic names representing their role. >>> >>> freemarker-cli >>> -t access-report.ftl >>> --document-name=main somewhere/foo-access-log.csv >>> --document-name=users somewhere/foo-users.csv >>> --document-name=groups somewhere/foo-groups.csv >>> >>> Here, `Document` still works in the template, and it refers to >>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main >> above >>> would be cleaner, I couldn't figure out how to do that with Picocli. >>> Anyway, for now the point is the concept, which is not specific to CLI.) >>> >>> >>> USE CASE 3 >>> >>> Here you have several of the same kind of documents. That has a more >>> generic sub-use-case, when you have explicitly named documents (like >>> "users" above), and for some you expect multiple input files. >>> >>> freemarker-cli >>> -t access-report.ftl >>> --document-name=main somewhere/foo-access-log.csv >>> somewhere/bar-access-log.csv >>> --document-name=users somewhere/foo-users.csv >>> somewhere/bar-users.csv >>> --document-name=groups somewhere/global-groups.csv >>> >>> The template must to be written with this use case in mind, as now it has >>> #list some of the documents. (I think in practice you hardly ever want to >>> get a document by hard coded index. Either you don't know how many >>> documents you have, so you can't use hard coded indexes, or you do, and >>> each index has a specific meaning, but then you should name the documents >>> instead, as using indexes is error prone, and hard to read.) >>> Accessing that list of documents in the template, maybe could be done >> like >>> this: >>> - For the "main" documents: `DocumentList` >>> - For explicitly named documents, like "users": >> `NamedDocumentLists.users` >>> >>> >>> SUMMING UP >>> >>> To unify all 3 use cases into a coherent concept: >>> - `NamedDocumentLists.<name>` is the most generic form, and while you can >>> achieve everything with it, using it requires your template to handle the >>> most generic case too. So, I think it would be rarely used. >>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's >>> used if you only have one kind of documents (single format and schema), >> but >>> potentially multiple of them. >>> - `NamedDocuments.<name>` expresses that you expect exactly 1 document of >>> the given name. >>> - `Document` is just a shorthand for `NamedDocuments.main`. This is for >> the >>> most natural/frequent use case. >>> >>> That's 4 possible ways of accessing your documents, which is a trade-off >>> for the sake of these: >>> - Catching CLI (or Maven, etc.) input where the template output likely >> will >>> be wrong. That's only possible if the user can communicate its intent in >>> the template. >>> - Users don't need to deal with concepts that are irrelevant in their >>> concrete use case. Just start with the trivial, `Document`, and later if >>> the need arises, generalize to named documents, document lists, or both. >>> >>> >>> What do guys think? >> >>
