Re: freemarker-generator: Improving the input documents concept

Siegfried Goeschl Sat, 29 Feb 2020 08:03:51 -0800

Hi Daniel,

I'm an enterprise developer - bad habits die hard :-)


So I closed the following tickets and merged the branches

1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into 
"freemarker-generator"
2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource"
3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names for 
datasources

Thanks in advance, 

Siegfried Goeschl


> On 29.02.2020, at 12:19, Daniel Dekany <[email protected]> wrote:
> 
> Yeah, and of course, you can merge that branch. You can even work on the
> master directly after all.
> 
> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <[email protected]>
> wrote:
> 
>> But, I do recognize the cattle use case (several "faceless" files with
>> common format/schema). Only, my idea is to push that complexity on the data
>> source. The "data source" concept shields the rest of the application from
>> the details of how the data is stored or retrieved. So, a data source might
>> loads a bunch of log files from a directory, and present them as a single
>> big table, or like a list of tables, etc. So I want to deal with the cattle
>> use case, but the question is what part of the of architecture will deal
>> with this complication, with other words, how do you box things. Why my
>> initial bet is to stuff that complication into the "data source"
>> implementation(s) is that data sources are inherently varied. Some returns
>> a table-like thing, some have multiple named tables (worksheets in Excel),
>> some returns tree of nodes (XML), etc. So then, some might returns a
>> list-of-list-of log records, or just a single list of log-records (put
>> together from daily log files). That way cattles don't add to conceptual
>> complexity. Now, you might be aware of cases where the cattle concept must
>> be more exposed than this, and the we can't box things like this. But this
>> is what I tried to express.
>> 
>> Regarding "output generators", and how that applies on the command line. I
>> think it's important that the common core between Maven and command-line is
>> as fat as possible. Ideally, they are just two syntax to set up the same
>> thing. Mostly at least. So, if you specify a template file to the CLI
>> application, in a way so that it causes it to process that template to
>> generate a single output, then there you have just defined an "output
>> generator" (even if it wasn't explicitly called like that in the command
>> line). If you specify 3 csv files to the CLI application, in a way so that
>> it causes it to generate 3 output files, then you have just defined 3
>> "output generators" there (there's at least one template specified there
>> too, but that wasn't an "output generator" itself, it was just an attribute
>> of the 3 output generators). If you specify 1 template, and 3 csv files, in
>> a way so that it will yield 4 output files (1 for the template, 3 for the
>> csv-s), then you have defined 4 output generators there. If you have a data
>> source that loads a list of 3 entities (say, 3 csv files, so it's a list of
>> tables then), and you have 2 templates, and you tell the CLI to execute
>> each template for each item in said data source, then you have just defined
>> 6 "output generators".
>> 
>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>> [email protected]> wrote:
>> 
>>> Hi Daniel,
>>> 
>>> That all depends on your mental model and work you do, expectations,
>>> experience :-)
>>> 
>>> 
>>> __Document Handling__
>>> 
>>> *"But I think actually we have no good use case for list of documents
>>> that's passed at once to a single template run, so, we can just ignore
>>> that complication"*
>>> 
>>> In my case that's not a complication but my daily business - I'm
>>> regularly wading through access logs - yesterday probably a couple of
>>> hundreds access logs across two staging sites to help tracking some
>>> strange API gateway issues :-)
>>> 
>>> My gut feeling is (borrowing from
>>> 
>>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>> )
>>> 
>>> 1. You have a few lovely named documents / templates - `pets`
>>> 2. You have tons of anonymous documents / templates to process -
>>> `cattle`
>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>> 
>>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
>>> it is equally important and common.
>>> 
>>> 
>>> __Template And Document Processing Modes__
>>> 
>>> IMHO it is important to answer the following question : "How many
>>> outputs do you get when rendering 2 template and 3 datasources? Two,
>>> Three or Six?"
>>> 
>>> Your answer is influenced by your mental model / experience
>>> 
>>> * When wading through tons of CSV files, access logs, etc. the answer is
>>> "2"
>>> * When doing source code generation the obvious answer is "6"
>>> * Can't image a use case which results in "3" but I'm pretty sure we
>>> will encounter one
>>> 
>>> __Template and document mode probably shouldn't exist__
>>> 
>>> That's hard for me to fully understand - I definitely lack your insights
>>> & experience writing such tools :-)
>>> 
>>> Defining the `Output Generator` is the underlying model for the Maven
>>> plugin (and probably FMPP).
>>> 
>>> I'm not sure if this applies for command lines at least not in the way I
>>> use them (or would like to use them)
>>> 
>>> 
>>> Thanks in advance,
>>> 
>>> Siegfried Goeschl
>>> 
>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>> 
>>> 
>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>> 
>>>> Yeah, "data source" is surely a too popular name, but for reason.
>>>> Anyone
>>>> has other ideas?
>>>> 
>>>> As of naming data sources and such. One thing I was wondering about
>>>> back
>>>> then is how to deal with list of documents given to a template, versus
>>>> exactly 1 document given to a template. But I think actually we have
>>>> no
>>>> good use case for list of documents that's passed at once to a single
>>>> template run, so, we can just ignore that complication. A document has
>>>> a
>>>> name, and that's always just a single document, not a collection, as
>>>> far as
>>>> the template is concerned. (We can have multiple documents per run,
>>>> but
>>>> those normally yield separate output generators, so it's still only
>>>> one
>>>> document per template.) However, we can have data source types
>>>> (document
>>>> types with old terminology) that collect together multiple data files.
>>>> So
>>>> then that complexity is encapsulated into the data source type, and
>>>> doesn't
>>>> complicate the overall architecture. That's another case when a data
>>>> source
>>>> is not just a file. Like maybe there's a data source type that loads
>>>> all
>>>> the CSV-s from a directory, into a single big table (I had such case),
>>>> or
>>>> even into a list of tables. Or, as I mentioned already, a data source
>>>> is
>>>> maybe an SQL query on a JDBC data source (and we got the first term
>>>> clash... JDBC also call them data sources).
>>>> 
>>>> Template and document mode probably shouldn't exist from user
>>>> perspective
>>>> either, at least not as a global option that must apply to everything
>>>> in a
>>>> run. They could just give the files that define the "output
>>>> generators",
>>>> and some of them will be templates, some of them are data files, in
>>>> which
>>>> case a template need to be associated with them (and there can be a
>>>> couple
>>>> of ways of doing that). And then again, there are the cases where you
>>>> want
>>>> to create one output generator per entity from some data source.
>>>> 
>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>> [email protected]> wrote:
>>>> 
>>>>> Hi Daniel,
>>>>> 
>>>>> See my comments below - and thanks for your patience and input :-)
>>>>> 
>>>>> *Renaming Document To DataSource*
>>>>> 
>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
>>>>> and
>>>>> its DataSource.
>>>>> 
>>>>> *Template And Document Mode*
>>>>> 
>>>>> Agreed - I think it is a valuable abstraction for the user but it is
>>>>> not
>>>>> an implementation concept :-)
>>>>> 
>>>>> *Document Without Symbolic Names*
>>>>> 
>>>>> Also agreed and it is going to change but I have not settled my mind
>>>>> yet
>>>>> what exactly to implement.
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Siegfried Goeschl
>>>>> 
>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>> 
>>>>> A few quick thoughts on that:
>>>>> 
>>>>> - We should replace the "document" term with something more speaking.
>>>>> It
>>>>> doesn't tell that it's some kind of input. Also, most of these inputs
>>>>> aren't something that people typically call documents. Like a csv
>>>>> file, or
>>>>> a database table, which is not even a file (OK we don't support such
>>>>> thing
>>>>> at the moment). I think, maybe "data source" is a safe enough term.
>>>>> (It
>>>>> also rhymes with data model.)
>>>>> - You have separate "template" and "document" "mode", that applies to
>>>>> a
>>>>> whole run. I think such specialization won't be helpful. We could
>>>>> just say,
>>>>> on the conceptual level at lest, that we need a set of "outputs
>>>>> generators". An output generator is an object (in the API) that
>>>>> specifies a
>>>>> template, a data-model (where the data-model is possibly populated
>>>>> with
>>>>> "documents"), and an output "sink" (a file path, or stdout), and can
>>>>> generate the output itself. A practical way of defining the output
>>>>> generators in a CLI application is via a bunch of files, each
>>>>> defining an
>>>>> output generator. Some of those files is maybe a template (that you
>>>>> can
>>>>> even detect from the file extension), or a data file that we
>>>>> currently call
>>>>> a "document". They could freely mix inside the same run. I have also
>>>>> met
>>>>> use case when you have a single table (single "document"), and each
>>>>> record
>>>>> in it yields an output file. That can also be described in some file
>>>>> format, or really in any other way, like directly in command line
>>>>> argument,
>>>>> via API, etc.
>>>>> - You have multiple documents without associated symbolical name in
>>>>> some
>>>>> examples. Templates can't identify those then in a well maintainable
>>>>> way.
>>>>> The actual file name is often not a good identifier, can change over
>>>>> time,
>>>>> and you might don't even have good control over it, like you already
>>>>> receive it as a parameter from somewhere else, or someone
>>>>> moves/renames
>>>>> that files that you need to read. Index is also not very good, but I
>>>>> have
>>>>> written about that earlier.
>>>>> 
>>>>> 
>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>> [email protected]> wrote:
>>>>> 
>>>>> Hi folks,
>>>>> 
>>>>> still wrapping my side around but assembled some thoughts here -
>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Siegfried Goeschl
>>>>> 
>>>>> 
>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]> wrote:
>>>>> 
>>>>> What you are describing is more like the angle that FMPP took
>>>>> initially,
>>>>> where templates drive things, they generate the output for themselves
>>>>> 
>>>>> (even
>>>>> 
>>>>> multiple output files if they wish). By default output files name
>>>>> (and
>>>>> relative path) is deduced from template name. There was also a global
>>>>> data-model, built in a configuration file (or equally, built via
>>>>> command
>>>>> line arguments, or both mixed), from which templates get whatever
>>>>> data
>>>>> 
>>>>> they
>>>>> 
>>>>> are interested in. Take a look at the figures here:
>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>>>>> 
>>>>> generalized
>>>>> 
>>>>> a bit more, because you could add XML files at the same place where
>>>>> you
>>>>> have the templates, and then you could associate transform templates
>>>>> to
>>>>> 
>>>>> the
>>>>> 
>>>>> XML files (based on path pattern and/or the XML document element).
>>>>> Now
>>>>> that's like what freemarker-generator had initially (data files drive
>>>>> output, and the template is there to transform it).
>>>>> 
>>>>> So I think the generic mental model would like this:
>>>>> 
>>>>> 1. You got files that drive the process, let's call them *generator
>>>>> files* for now. Usually, each generator file yields an output file
>>>>> (but
>>>>> maybe even multiple output files, as you might saw in the last
>>>>> figure).
>>>>> These generator files can be of many types, like XML, JSON, XLSX (as
>>>>> 
>>>>> in the
>>>>> 
>>>>> original freemarker-generator), and even templates (as is the norm in
>>>>> FMPP). If the file is not a template, then you got a set of
>>>>> transformer
>>>>> templates (-t CLI option) in a separate directory, which can be
>>>>> 
>>>>> associated
>>>>> 
>>>>> with the generator files base on name patterns, and even based on
>>>>> 
>>>>> content
>>>>> 
>>>>> (schema usually). If the generator file is a template (so that's a
>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
>>>>> is
>>>>> 
>>>>> not
>>>>> 
>>>>> a template file specified after the "-t" option), then you just
>>>>> Template.process(...) it, and it prints what the output will be.
>>>>> 2. You also have a set of variables, the global data-model, that
>>>>> contains commonly useful stuff, like what you now call parameters
>>>>> (CLI
>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
>>>>> 
>>>>> data
>>>>> 
>>>>> files aren't "generator files". Templates just use them if they need
>>>>> 
>>>>> them.
>>>>> 
>>>>> An important thing here is to reuse the same mechanism to read and
>>>>> 
>>>>> parse
>>>>> 
>>>>> those data files, which was used in templates when transforming
>>>>> 
>>>>> generator
>>>>> 
>>>>> files. So we need a common format for specifying how to load data
>>>>> 
>>>>> files.
>>>>> 
>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
>>>>> declarative format.
>>>>> 
>>>>> What I have described in the original post here was a less generic
>>>>> form
>>>>> 
>>>>> of
>>>>> 
>>>>> this, as I tried to be true with the original approach. I though the
>>>>> proposal will be drastic enough as it is... :) There, the "main"
>>>>> document
>>>>> is the "generator file" from point 1, the "-t" template is the
>>>>> transform
>>>>> template for the "main" document, and the other named documents
>>>>> ("users",
>>>>> "groups") is a poor man's shared data-model from point 2 (together
>>>>> with
>>>>> with -PName=value).
>>>>> 
>>>>> There's further somewhat confusing thing to get right with the
>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
>>>>> In
>>>>> the model above, as per point 1, if you list multiple data files,
>>>>> each
>>>>> 
>>>>> will
>>>>> 
>>>>> generate a separate output file. So, if you need take in a list of
>>>>> files
>>>>> 
>>>>> to
>>>>> 
>>>>> transform it to a single output file (or at least with a single
>>>>> transform
>>>>> template execution), then you have to be explicit about that, as
>>>>> that's
>>>>> 
>>>>> not
>>>>> 
>>>>> the default behavior anymore. But it's still absolutely possible.
>>>>> Imagine
>>>>> it as a "list of XLSX-es" is itself like a file format. You need some
>>>>> CLI
>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
>>>>> be a
>>>>> big deal.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>> [email protected]> wrote:
>>>>> 
>>>>> Hi Daniel,
>>>>> 
>>>>> Good timing - I was looking at a similar problem from different angle
>>>>> yesterday (see below)
>>>>> 
>>>>> Don't have enough time to answer your email in detail now - will do
>>>>> that
>>>>> tomorrow evening
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Siegfried Goeschl
>>>>> 
>>>>> 
>>>>> ===. START
>>>>> # FreeMarker CLI Improvement
>>>>> ## Support Of Multiple Template Files
>>>>> Currently we support the following combinations
>>>>> 
>>>>> * Single template and no data files
>>>>> * Single template and one or more data files
>>>>> 
>>>>> But we can not support the following use case which is quite typical
>>>>> in
>>>>> the cloud
>>>>> 
>>>>> __Convert multiple templates with a single data file, e.g copying a
>>>>> directory of configuration files using a JSON configuration file__
>>>>> 
>>>>> ## Implementation notes
>>>>> * When we copy a directory we can remove the `ftl`extension on the
>>>>> fly
>>>>> * We might need an `exclude` filter for the copy operation
>>>>> * Initially resolve to a list of template files and process one after
>>>>> another
>>>>> * Need to calculate the output file location and extension
>>>>> * We need to rename the existing command line parameters (see below)
>>>>> * Do we need multiple include and exclude filter?
>>>>> * Do we need file versus directory filters?
>>>>> 
>>>>> ### Command Line Options
>>>>> ```
>>>>> --input-encoding : Encoding of the documents
>>>>> --output-encoding : Encoding of the rendered template
>>>>> --template-encoding : Encoding of the template
>>>>> --output : Output file or directory
>>>>> --include-document : Include pattern for documents
>>>>> --exclude-document : Exclude pattern for documents
>>>>> --include-template: Include pattern for templates
>>>>> --exclude-template : Exclude pattern for templates
>>>>> ```
>>>>> 
>>>>> ### Command Line Examples
>>>>> ```text
>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
>>>>> 
>>>>> directory
>>>>> 
>>>>> using the data from "config.json"
>>>>> 
>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>>>>> 
>>>>> config.json
>>>>> 
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>> 
>>>>> --output
>>>>> 
>>>>> /config config.json
>>>>> 
>>>>> # Bascically the same using a named document "configuration"
>>>>> # It might make sense to expose "conf" directly in the FreeMarker
>>>>> data
>>>>> model
>>>>> # It might make sens to allow URIs for loading documents
>>>>> 
>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>> 
>>>>> configuration=config.json
>>>>> 
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>> 
>>>>> --output
>>>>> 
>>>>> /config --document configuration=config.json
>>>>> 
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>> 
>>>>> --output
>>>>> 
>>>>> /config --document configuration=file:///config.json
>>>>> 
>>>>> # Bascically the same using an environment variable as named document
>>>>> 
>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>>>>> 
>>>>> configuration=env:///CONFIGURATION
>>>>> 
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>> 
>>>>> --output
>>>>> 
>>>>> /config --document configuration=env:///CONFIGURATION
>>>>> ```
>>>>> === END
>>>>> 
>>>>> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote:
>>>>> 
>>>>> Input documents is a fundamental concept in freemarker-generator, so
>>>>> we
>>>>> should think about that more, and probably refine/rework how it's
>>>>> done.
>>>>> 
>>>>> Currently it works like this, with CLI at least.
>>>>> 
>>>>> freemarker-cli
>>>>> -t access-report.ftl
>>>>> somewhere/foo-access-log.csv
>>>>> 
>>>>> Then in access-report.ftl you have to do something like this:
>>>>> 
>>>>> <#assign doc = Documents.get(0)>
>>>>> ... process doc here
>>>>> 
>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>>>>> 
>>>>> funny
>>>>> 
>>>>> chain of coincidences: It returned the string "D", then
>>>>> 
>>>>> CSVTool.parse(...)
>>>>> 
>>>>> happily parsed that to a table with the single column "D", and 0
>>>>> rows,
>>>>> 
>>>>> and
>>>>> 
>>>>> as there were 0 rows, the template didn't run into an error because
>>>>> row.myExpectedColumn refers to a missing column either, so the
>>>>> process
>>>>> finished with success. (: Pretty unlucky for sure. The root was
>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
>>>>> will
>>>>> 
>>>>> have
>>>>> 
>>>>> to work on those too, but, different topic.)
>>>>> 
>>>>> However, actually multiple input documents can be passed in:
>>>>> 
>>>>> freemarker-cli
>>>>> -t access-report.ftl
>>>>> somewhere/foo-access-log.csv
>>>>> somewhere/bar-access-log.csv
>>>>> 
>>>>> Above template will still work, though then you ignored all but the
>>>>> 
>>>>> first
>>>>> 
>>>>> document. So if you expect any number of input documents, you
>>>>> probably
>>>>> 
>>>>> will
>>>>> 
>>>>> have to do this:
>>>>> 
>>>>> <#list Documents.list as doc>
>>>>> ... process doc here
>>>>> </#list>
>>>>> 
>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
>>>>> 
>>>>> those
>>>>> 
>>>>> we will work out in a different thread.)
>>>>> 
>>>>> 
>>>>> So, what would be better, in my opinion. I start out from what I
>>>>> think
>>>>> 
>>>>> are
>>>>> 
>>>>> the common uses cases, in decreasing order of frequency. Goal is to
>>>>> 
>>>>> make
>>>>> 
>>>>> those less error prone for the users, and simpler to express.
>>>>> 
>>>>> USE CASE 1
>>>>> 
>>>>> You have exactly 1 input documents, which is therefore simply "the"
>>>>> document in the mind of the user. This is probably the typical use
>>>>> 
>>>>> case,
>>>>> 
>>>>> but at least the use case users typically start out from when
>>>>> starting
>>>>> 
>>>>> the
>>>>> 
>>>>> work.
>>>>> 
>>>>> freemarker-cli
>>>>> -t access-report.ftl
>>>>> somewhere/foo-access-log.csv
>>>>> 
>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>>>>> 
>>>>> error
>>>>> 
>>>>> prone, because if the user passed in more than 1 documents (can even
>>>>> 
>>>>> happen
>>>>> 
>>>>> totally accidentally, like if the user was lazy and used a wildcard
>>>>> 
>>>>> that
>>>>> 
>>>>> the shell exploded), the template will silently ignore the rest of
>>>>> the
>>>>> documents, and the singe document processed will be practically
>>>>> picked
>>>>> randomly. The user might won't notice that and submits a bad report
>>>>> or
>>>>> 
>>>>> such.
>>>>> 
>>>>> I think that in this use case the document should be simply referred
>>>>> as
>>>>> `Document` in the template. When you have multiple documents there,
>>>>> referring to `Document` should be an error, saying that the template
>>>>> 
>>>>> was
>>>>> 
>>>>> made to process a single document only.
>>>>> 
>>>>> 
>>>>> USE CASE 2
>>>>> 
>>>>> You have multiple input documents, but each has different role
>>>>> 
>>>>> (different
>>>>> 
>>>>> schema, maybe different file type). Like, you pass in users.csv and
>>>>> groups.csv. Each has difference schema, and so you want to access
>>>>> them
>>>>> differently, but in the same template.
>>>>> 
>>>>> freemarker-cli
>>>>> [...]
>>>>> --named-document users somewhere/foo-users.csv
>>>>> --named-document groups somewhere/foo-groups.csv
>>>>> 
>>>>> Then in the template you could refer to them as:
>>>>> 
>>>>> `NamedDocuments.users`,
>>>>> 
>>>>> and `NamedDocuments.groups`.
>>>>> 
>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
>>>>> 
>>>>> `Document`
>>>>> 
>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>>>>> 
>>>>> because
>>>>> 
>>>>> that's "the" document the template is about, but then you have to
>>>>> added
>>>>> some helper documents, with symbolic names representing their role.
>>>>> 
>>>>> freemarker-cli
>>>>> -t access-report.ftl
>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>> --document-name=users somewhere/foo-users.csv
>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>> 
>>>>> Here, `Document` still works in the template, and it refers to
>>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>>>>> 
>>>>> above
>>>>> 
>>>>> would be cleaner, I couldn't figure out how to do that with Picocli.
>>>>> Anyway, for now the point is the concept, which is not specific to
>>>>> 
>>>>> CLI.)
>>>>> 
>>>>> USE CASE 3
>>>>> 
>>>>> Here you have several of the same kind of documents. That has a more
>>>>> generic sub-use-case, when you have explicitly named documents (like
>>>>> "users" above), and for some you expect multiple input files.
>>>>> 
>>>>> freemarker-cli
>>>>> -t access-report.ftl
>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>> somewhere/bar-access-log.csv
>>>>> --document-name=users somewhere/foo-users.csv
>>>>> somewhere/bar-users.csv
>>>>> --document-name=groups somewhere/global-groups.csv
>>>>> 
>>>>> The template must to be written with this use case in mind, as now it
>>>>> 
>>>>> has
>>>>> 
>>>>> #list some of the documents. (I think in practice you hardly ever
>>>>> want
>>>>> 
>>>>> to
>>>>> 
>>>>> get a document by hard coded index. Either you don't know how many
>>>>> documents you have, so you can't use hard coded indexes, or you do,
>>>>> and
>>>>> each index has a specific meaning, but then you should name the
>>>>> 
>>>>> documents
>>>>> 
>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>> Accessing that list of documents in the template, maybe could be done
>>>>> 
>>>>> like
>>>>> 
>>>>> this:
>>>>> - For the "main" documents: `DocumentList`
>>>>> - For explicitly named documents, like "users":
>>>>> 
>>>>> `NamedDocumentLists.users`
>>>>> 
>>>>> SUMMING UP
>>>>> 
>>>>> To unify all 3 use cases into a coherent concept:
>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you
>>>>> 
>>>>> can
>>>>> 
>>>>> achieve everything with it, using it requires your template to handle
>>>>> 
>>>>> the
>>>>> 
>>>>> most generic case too. So, I think it would be rarely used.
>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>>>>> 
>>>>> It's
>>>>> 
>>>>> used if you only have one kind of documents (single format and
>>>>> schema),
>>>>> 
>>>>> but
>>>>> 
>>>>> potentially multiple of them.
>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
>>>>> document
>>>>> 
>>>>> of
>>>>> 
>>>>> the given name.
>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
>>>>> for
>>>>> 
>>>>> the
>>>>> 
>>>>> most natural/frequent use case.
>>>>> 
>>>>> That's 4 possible ways of accessing your documents, which is a
>>>>> 
>>>>> trade-off
>>>>> 
>>>>> for the sake of these:
>>>>> - Catching CLI (or Maven, etc.) input where the template output
>>>>> likely
>>>>> 
>>>>> will
>>>>> 
>>>>> be wrong. That's only possible if the user can communicate its intent
>>>>> 
>>>>> in
>>>>> 
>>>>> the template.
>>>>> - Users don't need to deal with concepts that are irrelevant in their
>>>>> concrete use case. Just start with the trivial, `Document`, and later
>>>>> 
>>>>> if
>>>>> 
>>>>> the need arises, generalize to named documents, document lists, or
>>>>> 
>>>>> both.
>>>>> 
>>>>> What do guys think?
>>>>> 
>>>>> 
>>> 
>> 
>> 
>> --
>> Best regards,
>> Daniel Dekany
>> 
> 
> 
> -- 
> Best regards,
> Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Reply via email to