freemarker-generator: Improving the input documents concept

Daniel Dekany Sun, 23 Feb 2020 07:38:11 -0800

Input documents is a fundamental concept in freemarker-generator, so we
should think about that more, and probably refine/rework how it's done.


Currently it works like this, with CLI at least.

    freemarker-cli
        -t access-report.ftl
        somewhere/foo-access-log.csv

Then in access-report.ftl you have to do something like this:

    <#assign doc = Documents.get(0)>
    ... process doc here

(The more idiomatic Documents[0] won't work. Actually, that lead to a funny
chain of coincidences: It returned the string "D", then CSVTool.parse(...)
happily parsed that to a table with the single column "D", and 0 rows, and
as there were 0 rows, the template didn't run into an error because
row.myExpectedColumn refers to a missing column either, so the process
finished with success. (: Pretty unlucky for sure. The root was
unintentionally breaking a FreeMarker idiom though; eventually we will have
to work on those too, but, different topic.)

However, actually multiple input documents can be passed in:

    freemarker-cli
        -t access-report.ftl
        somewhere/foo-access-log.csv
        somewhere/bar-access-log.csv

Above template will still work, though then you ignored all but the first
document. So if you expect any number of input documents, you probably will
have to do this:

    <#list Documents.list as doc>
          ... process doc here
    </#list>

(The more idiomatic <#list Documents as doc> won't work; but again, those
we will work out in a different thread.)


So, what would be better, in my opinion. I start out from what I think are
the common uses cases, in decreasing order of frequency. Goal is to make
those less error prone for the users, and simpler to express.

USE CASE 1

You have exactly 1 input documents, which is therefore simply "the"
document in the mind of the user. This is probably the typical use case,
but at least the use case users typically start out from when starting the
work.

    freemarker-cli
        -t access-report.ftl
        somewhere/foo-access-log.csv

Then `Documents.get(0)` is not very fitting. Most importantly it's error
prone, because if the user passed in more than 1 documents (can even happen
totally accidentally, like if the user was lazy and used a wildcard that
the shell exploded), the template will silently ignore the rest of the
documents, and the singe document processed will be practically picked
randomly. The user might won't notice that and submits a bad report or such.

I think that in this use case the document should be simply referred as
`Document` in the template. When you have multiple documents there,
referring to `Document` should be an error, saying that the template was
made to process a single document only.


USE CASE 2

You have multiple input documents, but each has different role (different
schema, maybe different file type). Like, you pass in users.csv and
groups.csv. Each has difference schema, and so you want to access them
differently, but in the same template.

    freemarker-cli
        [...]
        --named-document users somewhere/foo-users.csv
        --named-document groups somewhere/foo-groups.csv

Then in the template you could refer to them as: `NamedDocuments.users`,
and `NamedDocuments.groups`.

Use Case 1, and 2 can be unified into a coherent concept, where `Document`
is just a shorthand for `NamedDocuments.main`. It's called "main" because
that's "the" document the template is about, but then you have to added
some helper documents, with symbolic names representing their role.

    freemarker-cli
        -t access-report.ftl
        --document-name=main somewhere/foo-access-log.csv
        --document-name=users somewhere/foo-users.csv
        --document-name=groups somewhere/foo-groups.csv

Here, `Document` still works in the template, and it refers to
`somewhere/foo-access-log.csv`. (While omitting --document-name=main above
would be cleaner, I couldn't figure out how to do that with Picocli.
Anyway, for now the point is the concept, which is not specific to CLI.)


USE CASE 3

Here you have several of the same kind of documents. That has a more
generic sub-use-case, when you have explicitly named documents (like
"users" above), and for some you expect multiple input files.

    freemarker-cli
        -t access-report.ftl
        --document-name=main somewhere/foo-access-log.csv
somewhere/bar-access-log.csv
        --document-name=users somewhere/foo-users.csv
somewhere/bar-users.csv
        --document-name=groups somewhere/global-groups.csv

The template must to be written with this use case in mind, as now it has
#list some of the documents. (I think in practice you hardly ever want to
get a document by hard coded index. Either you don't know how many
documents you have, so you can't use hard coded indexes, or you do, and
each index has a specific meaning, but then you should name the documents
instead, as using indexes is error prone, and hard to read.)
Accessing that list of documents in the template, maybe could be done like
this:
- For the "main" documents: `DocumentList`
- For explicitly named documents, like "users": `NamedDocumentLists.users`


SUMMING UP

To unify all 3 use cases into a coherent concept:
- `NamedDocumentLists.<name>` is the most generic form, and while you can
achieve everything with it, using it requires your template to handle the
most generic case too. So, I think it would be rarely used.
- `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's
used if you only have one kind of documents (single format and schema), but
potentially multiple of them.
- `NamedDocuments.<name>` expresses that you expect exactly 1 document of
the given name.
- `Document` is just a shorthand for `NamedDocuments.main`. This is for the
most natural/frequent use case.

That's 4 possible ways of accessing your documents, which is a trade-off
for the sake of these:
- Catching CLI (or Maven, etc.) input where the template output likely will
be wrong. That's only possible if the user can communicate its intent in
the template.
- Users don't need to deal with concepts that are irrelevant in their
concrete use case. Just start with the trivial, `Document`, and later if
the need arises, generalize to named documents, document lists, or both.


What do guys think?

freemarker-generator: Improving the input documents concept

Reply via email to