Re: freemarker-generator: Improving the input documents concept

Daniel Dekany Sun, 23 Feb 2020 07:45:10 -0800

Correction... this is not what I meant:

    freemarker-cli
        [...]
        --named-document users somewhere/foo-users.csv
        --named-document groups somewhere/foo-groups.csv


It should have been this:

    freemarker-cli
        [...]
        --document-name=users somewhere/foo-users.csv
        --document-name=groups somewhere/foo-groups.csv


On Sun, Feb 23, 2020 at 4:37 PM Daniel Dekany <[email protected]> wrote:

> Input documents is a fundamental concept in freemarker-generator, so we
> should think about that more, and probably refine/rework how it's done.
>
> Currently it works like this, with CLI at least.
>
>     freemarker-cli
>         -t access-report.ftl
>         somewhere/foo-access-log.csv
>
> Then in access-report.ftl you have to do something like this:
>
>     <#assign doc = Documents.get(0)>
>     ... process doc here
>
> (The more idiomatic Documents[0] won't work. Actually, that lead to a
> funny chain of coincidences: It returned the string "D", then
> CSVTool.parse(...) happily parsed that to a table with the single column
> "D", and 0 rows, and as there were 0 rows, the template didn't run into an
> error because row.myExpectedColumn refers to a missing column either, so
> the process finished with success. (: Pretty unlucky for sure. The root was
> unintentionally breaking a FreeMarker idiom though; eventually we will have
> to work on those too, but, different topic.)
>
> However, actually multiple input documents can be passed in:
>
>     freemarker-cli
>         -t access-report.ftl
>         somewhere/foo-access-log.csv
>         somewhere/bar-access-log.csv
>
> Above template will still work, though then you ignored all but the first
> document. So if you expect any number of input documents, you probably will
> have to do this:
>
>     <#list Documents.list as doc>
>           ... process doc here
>     </#list>
>
> (The more idiomatic <#list Documents as doc> won't work; but again, those
> we will work out in a different thread.)
>
>
> So, what would be better, in my opinion. I start out from what I think are
> the common uses cases, in decreasing order of frequency. Goal is to make
> those less error prone for the users, and simpler to express.
>
> USE CASE 1
>
> You have exactly 1 input documents, which is therefore simply "the"
> document in the mind of the user. This is probably the typical use case,
> but at least the use case users typically start out from when starting the
> work.
>
>     freemarker-cli
>         -t access-report.ftl
>         somewhere/foo-access-log.csv
>
> Then `Documents.get(0)` is not very fitting. Most importantly it's error
> prone, because if the user passed in more than 1 documents (can even happen
> totally accidentally, like if the user was lazy and used a wildcard that
> the shell exploded), the template will silently ignore the rest of the
> documents, and the singe document processed will be practically picked
> randomly. The user might won't notice that and submits a bad report or such.
>
> I think that in this use case the document should be simply referred as
> `Document` in the template. When you have multiple documents there,
> referring to `Document` should be an error, saying that the template was
> made to process a single document only.
>
>
> USE CASE 2
>
> You have multiple input documents, but each has different role (different
> schema, maybe different file type). Like, you pass in users.csv and
> groups.csv. Each has difference schema, and so you want to access them
> differently, but in the same template.
>
>     freemarker-cli
>         [...]
>         --named-document users somewhere/foo-users.csv
>         --named-document groups somewhere/foo-groups.csv
>
> Then in the template you could refer to them as: `NamedDocuments.users`,
> and `NamedDocuments.groups`.
>
> Use Case 1, and 2 can be unified into a coherent concept, where `Document`
> is just a shorthand for `NamedDocuments.main`. It's called "main" because
> that's "the" document the template is about, but then you have to added
> some helper documents, with symbolic names representing their role.
>
>     freemarker-cli
>         -t access-report.ftl
>         --document-name=main somewhere/foo-access-log.csv
>         --document-name=users somewhere/foo-users.csv
>         --document-name=groups somewhere/foo-groups.csv
>
> Here, `Document` still works in the template, and it refers to
> `somewhere/foo-access-log.csv`. (While omitting --document-name=main above
> would be cleaner, I couldn't figure out how to do that with Picocli.
> Anyway, for now the point is the concept, which is not specific to CLI.)
>
>
> USE CASE 3
>
> Here you have several of the same kind of documents. That has a more
> generic sub-use-case, when you have explicitly named documents (like
> "users" above), and for some you expect multiple input files.
>
>     freemarker-cli
>         -t access-report.ftl
>         --document-name=main somewhere/foo-access-log.csv
> somewhere/bar-access-log.csv
>         --document-name=users somewhere/foo-users.csv
> somewhere/bar-users.csv
>         --document-name=groups somewhere/global-groups.csv
>
> The template must to be written with this use case in mind, as now it has
> #list some of the documents. (I think in practice you hardly ever want to
> get a document by hard coded index. Either you don't know how many
> documents you have, so you can't use hard coded indexes, or you do, and
> each index has a specific meaning, but then you should name the documents
> instead, as using indexes is error prone, and hard to read.)
> Accessing that list of documents in the template, maybe could be done like
> this:
> - For the "main" documents: `DocumentList`
> - For explicitly named documents, like "users": `NamedDocumentLists.users`
>
>
> SUMMING UP
>
> To unify all 3 use cases into a coherent concept:
> - `NamedDocumentLists.<name>` is the most generic form, and while you can
> achieve everything with it, using it requires your template to handle the
> most generic case too. So, I think it would be rarely used.
> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's
> used if you only have one kind of documents (single format and schema), but
> potentially multiple of them.
> - `NamedDocuments.<name>` expresses that you expect exactly 1 document of
> the given name.
> - `Document` is just a shorthand for `NamedDocuments.main`. This is for
> the most natural/frequent use case.
>
> That's 4 possible ways of accessing your documents, which is a trade-off
> for the sake of these:
> - Catching CLI (or Maven, etc.) input where the template output likely
> will be wrong. That's only possible if the user can communicate its intent
> in the template.
> - Users don't need to deal with concepts that are irrelevant in their
> concrete use case. Just start with the trivial, `Document`, and later if
> the need arises, generalize to named documents, document lists, or both.
>
>
> What do guys think?
>
>

Re: freemarker-generator: Improving the input documents concept

Reply via email to