Re: freemarker-generator: Improving the input documents concept

Siegfried Goeschl Sun, 23 Feb 2020 12:43:56 -0800

Hi Daniel,

Good timing - I was looking at a similar problem from different angle yesterday 
(see below)


Don't have enough time to answer your email in detail now - will do that 
tomorrow evening

Thanks in advance, 

Siegfried Goeschl


===. START
# FreeMarker CLI Improvement
## Support Of Multiple Template Files
Currently we support the following combinations

* Single template and no data files
* Single template and one or more data files

But we can not support the following use case which is quite typical in the 
cloud

__Convert multiple templates with a single data file, e.g copying a directory 
of configuration files using a JSON configuration file__

## Implementation notes
* When we copy a directory we can remove the `ftl`extension on the fly
* We might need an `exclude` filter for the copy operation
* Initially resolve to a list of template files and process one after another
* Need to calculate the output file location and extension
* We need to rename the existing command line parameters  (see below)
* Do we need multiple include and exclude filter?
* Do we need file versus directory filters?

### Command Line Options
```
--input-encoding : Encoding of the documents
--output-encoding : Encoding of the rendered template
--template-encoding : Encoding of the template
--output : Output file or directory
--include-document : Include pattern for documents
--exclude-document : Exclude pattern for documents
--include-template: Include pattern for templates
--exclude-template : Exclude pattern for templates
```

### Command Line Examples
```text
# Copy all FTL templates found in "ext/config" to the "/config" directory using 
the data from "config.json"
> freemarker-cli -t ./ext/config --include-template *.ftl --o /config 
> config.json
> freemarker-cli --template ./ext/config --include-template *.ftl --output 
> /config config.json

# Bascically the same using a named document "configuration"
# It might make sense to expose "conf" directly in the FreeMarker data model
# It might make sens to allow URIs for loading documents
> freemarker-cli -t ./ext/config/*.ftl -o /config -d configuration=config.json
> freemarker-cli --template ./ext/config --include-template *.ftl --output 
> /config --document configuration=config.json
> freemarker-cli --template ./ext/config --include-template *.ftl --output 
> /config --document configuration=file:///config.json

# Bascically the same using an environment variable as named document
> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d 
> configuration=env:///CONFIGURATION
> freemarker-cli --template ./ext/config --include-template *.ftl --output 
> /config --document configuration=env:///CONFIGURATION
```
=== END


> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote:
> 
> Input documents is a fundamental concept in freemarker-generator, so we
> should think about that more, and probably refine/rework how it's done.
> 
> Currently it works like this, with CLI at least.
> 
>    freemarker-cli
>        -t access-report.ftl
>        somewhere/foo-access-log.csv
> 
> Then in access-report.ftl you have to do something like this:
> 
>    <#assign doc = Documents.get(0)>
>    ... process doc here
> 
> (The more idiomatic Documents[0] won't work. Actually, that lead to a funny
> chain of coincidences: It returned the string "D", then CSVTool.parse(...)
> happily parsed that to a table with the single column "D", and 0 rows, and
> as there were 0 rows, the template didn't run into an error because
> row.myExpectedColumn refers to a missing column either, so the process
> finished with success. (: Pretty unlucky for sure. The root was
> unintentionally breaking a FreeMarker idiom though; eventually we will have
> to work on those too, but, different topic.)
> 
> However, actually multiple input documents can be passed in:
> 
>    freemarker-cli
>        -t access-report.ftl
>        somewhere/foo-access-log.csv
>        somewhere/bar-access-log.csv
> 
> Above template will still work, though then you ignored all but the first
> document. So if you expect any number of input documents, you probably will
> have to do this:
> 
>    <#list Documents.list as doc>
>          ... process doc here
>    </#list>
> 
> (The more idiomatic <#list Documents as doc> won't work; but again, those
> we will work out in a different thread.)
> 
> 
> So, what would be better, in my opinion. I start out from what I think are
> the common uses cases, in decreasing order of frequency. Goal is to make
> those less error prone for the users, and simpler to express.
> 
> USE CASE 1
> 
> You have exactly 1 input documents, which is therefore simply "the"
> document in the mind of the user. This is probably the typical use case,
> but at least the use case users typically start out from when starting the
> work.
> 
>    freemarker-cli
>        -t access-report.ftl
>        somewhere/foo-access-log.csv
> 
> Then `Documents.get(0)` is not very fitting. Most importantly it's error
> prone, because if the user passed in more than 1 documents (can even happen
> totally accidentally, like if the user was lazy and used a wildcard that
> the shell exploded), the template will silently ignore the rest of the
> documents, and the singe document processed will be practically picked
> randomly. The user might won't notice that and submits a bad report or such.
> 
> I think that in this use case the document should be simply referred as
> `Document` in the template. When you have multiple documents there,
> referring to `Document` should be an error, saying that the template was
> made to process a single document only.
> 
> 
> USE CASE 2
> 
> You have multiple input documents, but each has different role (different
> schema, maybe different file type). Like, you pass in users.csv and
> groups.csv. Each has difference schema, and so you want to access them
> differently, but in the same template.
> 
>    freemarker-cli
>        [...]
>        --named-document users somewhere/foo-users.csv
>        --named-document groups somewhere/foo-groups.csv
> 
> Then in the template you could refer to them as: `NamedDocuments.users`,
> and `NamedDocuments.groups`.
> 
> Use Case 1, and 2 can be unified into a coherent concept, where `Document`
> is just a shorthand for `NamedDocuments.main`. It's called "main" because
> that's "the" document the template is about, but then you have to added
> some helper documents, with symbolic names representing their role.
> 
>    freemarker-cli
>        -t access-report.ftl
>        --document-name=main somewhere/foo-access-log.csv
>        --document-name=users somewhere/foo-users.csv
>        --document-name=groups somewhere/foo-groups.csv
> 
> Here, `Document` still works in the template, and it refers to
> `somewhere/foo-access-log.csv`. (While omitting --document-name=main above
> would be cleaner, I couldn't figure out how to do that with Picocli.
> Anyway, for now the point is the concept, which is not specific to CLI.)
> 
> 
> USE CASE 3
> 
> Here you have several of the same kind of documents. That has a more
> generic sub-use-case, when you have explicitly named documents (like
> "users" above), and for some you expect multiple input files.
> 
>    freemarker-cli
>        -t access-report.ftl
>        --document-name=main somewhere/foo-access-log.csv
> somewhere/bar-access-log.csv
>        --document-name=users somewhere/foo-users.csv
> somewhere/bar-users.csv
>        --document-name=groups somewhere/global-groups.csv
> 
> The template must to be written with this use case in mind, as now it has
> #list some of the documents. (I think in practice you hardly ever want to
> get a document by hard coded index. Either you don't know how many
> documents you have, so you can't use hard coded indexes, or you do, and
> each index has a specific meaning, but then you should name the documents
> instead, as using indexes is error prone, and hard to read.)
> Accessing that list of documents in the template, maybe could be done like
> this:
> - For the "main" documents: `DocumentList`
> - For explicitly named documents, like "users": `NamedDocumentLists.users`
> 
> 
> SUMMING UP
> 
> To unify all 3 use cases into a coherent concept:
> - `NamedDocumentLists.<name>` is the most generic form, and while you can
> achieve everything with it, using it requires your template to handle the
> most generic case too. So, I think it would be rarely used.
> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's
> used if you only have one kind of documents (single format and schema), but
> potentially multiple of them.
> - `NamedDocuments.<name>` expresses that you expect exactly 1 document of
> the given name.
> - `Document` is just a shorthand for `NamedDocuments.main`. This is for the
> most natural/frequent use case.
> 
> That's 4 possible ways of accessing your documents, which is a trade-off
> for the sake of these:
> - Catching CLI (or Maven, etc.) input where the template output likely will
> be wrong. That's only possible if the user can communicate its intent in
> the template.
> - Users don't need to deal with concepts that are irrelevant in their
> concrete use case. Just start with the trivial, `Document`, and later if
> the need arises, generalize to named documents, document lists, or both.
> 
> 
> What do guys think?

Re: freemarker-generator: Improving the input documents concept

Reply via email to