Re: freemarker-generator: Improving the input documents concept

Siegfried Goeschl Sat, 29 Feb 2020 09:09:45 -0800
As discussed before the name is widely used :-)

> On 29.02.2020, at 18:05, Siegfried Goeschl <[email protected]> 
> wrote:
> 
> Well, clashes with the "java.activation.DataSource" - can do & not definite 
> opinion about it :)
> 
>> On 29.02.2020, at 18:03, Daniel Dekany <[email protected]> wrote:
>> 
>> I believe that should be DataSource (with capital S), as it's two words.
>> 
>> Also, it's the name of a too widely used and known JDBC interface. So if
>> anyone can tell a similarly descriptive alternative...
>> 
>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
>> [email protected]> wrote:
>> 
>>> Hi Daniel,
>>> 
>>> I'm an enterprise developer - bad habits die hard :-)
>>> 
>>> So I closed the following tickets and merged the branches
>>> 
>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
>>> "freemarker-generator"
>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource"
>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
>>> for datasources
>>> 
>>> Thanks in advance,
>>> 
>>> Siegfried Goeschl
>>> 
>>> 
>>>> On 29.02.2020, at 12:19, Daniel Dekany <[email protected]> wrote:
>>>> 
>>>> Yeah, and of course, you can merge that branch. You can even work on the
>>>> master directly after all.
>>>> 
>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <[email protected]>
>>>> wrote:
>>>> 
>>>>> But, I do recognize the cattle use case (several "faceless" files with
>>>>> common format/schema). Only, my idea is to push that complexity on the
>>> data
>>>>> source. The "data source" concept shields the rest of the application
>>> from
>>>>> the details of how the data is stored or retrieved. So, a data source
>>> might
>>>>> loads a bunch of log files from a directory, and present them as a
>>> single
>>>>> big table, or like a list of tables, etc. So I want to deal with the
>>> cattle
>>>>> use case, but the question is what part of the of architecture will deal
>>>>> with this complication, with other words, how do you box things. Why my
>>>>> initial bet is to stuff that complication into the "data source"
>>>>> implementation(s) is that data sources are inherently varied. Some
>>> returns
>>>>> a table-like thing, some have multiple named tables (worksheets in
>>> Excel),
>>>>> some returns tree of nodes (XML), etc. So then, some might returns a
>>>>> list-of-list-of log records, or just a single list of log-records (put
>>>>> together from daily log files). That way cattles don't add to conceptual
>>>>> complexity. Now, you might be aware of cases where the cattle concept
>>> must
>>>>> be more exposed than this, and the we can't box things like this. But
>>> this
>>>>> is what I tried to express.
>>>>> 
>>>>> Regarding "output generators", and how that applies on the command
>>> line. I
>>>>> think it's important that the common core between Maven and
>>> command-line is
>>>>> as fat as possible. Ideally, they are just two syntax to set up the same
>>>>> thing. Mostly at least. So, if you specify a template file to the CLI
>>>>> application, in a way so that it causes it to process that template to
>>>>> generate a single output, then there you have just defined an "output
>>>>> generator" (even if it wasn't explicitly called like that in the command
>>>>> line). If you specify 3 csv files to the CLI application, in a way so
>>> that
>>>>> it causes it to generate 3 output files, then you have just defined 3
>>>>> "output generators" there (there's at least one template specified there
>>>>> too, but that wasn't an "output generator" itself, it was just an
>>> attribute
>>>>> of the 3 output generators). If you specify 1 template, and 3 csv
>>> files, in
>>>>> a way so that it will yield 4 output files (1 for the template, 3 for
>>> the
>>>>> csv-s), then you have defined 4 output generators there. If you have a
>>> data
>>>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
>>> list of
>>>>> tables then), and you have 2 templates, and you tell the CLI to execute
>>>>> each template for each item in said data source, then you have just
>>> defined
>>>>> 6 "output generators".
>>>>> 
>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Hi Daniel,
>>>>>> 
>>>>>> That all depends on your mental model and work you do, expectations,
>>>>>> experience :-)
>>>>>> 
>>>>>> 
>>>>>> __Document Handling__
>>>>>> 
>>>>>> *"But I think actually we have no good use case for list of documents
>>>>>> that's passed at once to a single template run, so, we can just ignore
>>>>>> that complication"*
>>>>>> 
>>>>>> In my case that's not a complication but my daily business - I'm
>>>>>> regularly wading through access logs - yesterday probably a couple of
>>>>>> hundreds access logs across two staging sites to help tracking some
>>>>>> strange API gateway issues :-)
>>>>>> 
>>>>>> My gut feeling is (borrowing from
>>>>>> 
>>>>>> 
>>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>>>>> )
>>>>>> 
>>>>>> 1. You have a few lovely named documents / templates - `pets`
>>>>>> 2. You have tons of anonymous documents / templates to process -
>>>>>> `cattle`
>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>>>>> 
>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
>>>>>> it is equally important and common.
>>>>>> 
>>>>>> 
>>>>>> __Template And Document Processing Modes__
>>>>>> 
>>>>>> IMHO it is important to answer the following question : "How many
>>>>>> outputs do you get when rendering 2 template and 3 datasources? Two,
>>>>>> Three or Six?"
>>>>>> 
>>>>>> Your answer is influenced by your mental model / experience
>>>>>> 
>>>>>> * When wading through tons of CSV files, access logs, etc. the answer
>>> is
>>>>>> "2"
>>>>>> * When doing source code generation the obvious answer is "6"
>>>>>> * Can't image a use case which results in "3" but I'm pretty sure we
>>>>>> will encounter one
>>>>>> 
>>>>>> __Template and document mode probably shouldn't exist__
>>>>>> 
>>>>>> That's hard for me to fully understand - I definitely lack your
>>> insights
>>>>>> & experience writing such tools :-)
>>>>>> 
>>>>>> Defining the `Output Generator` is the underlying model for the Maven
>>>>>> plugin (and probably FMPP).
>>>>>> 
>>>>>> I'm not sure if this applies for command lines at least not in the way
>>> I
>>>>>> use them (or would like to use them)
>>>>>> 
>>>>>> 
>>>>>> Thanks in advance,
>>>>>> 
>>>>>> Siegfried Goeschl
>>>>>> 
>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>>>>> 
>>>>>> 
>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>>>>> 
>>>>>>> Yeah, "data source" is surely a too popular name, but for reason.
>>>>>>> Anyone
>>>>>>> has other ideas?
>>>>>>> 
>>>>>>> As of naming data sources and such. One thing I was wondering about
>>>>>>> back
>>>>>>> then is how to deal with list of documents given to a template, versus
>>>>>>> exactly 1 document given to a template. But I think actually we have
>>>>>>> no
>>>>>>> good use case for list of documents that's passed at once to a single
>>>>>>> template run, so, we can just ignore that complication. A document has
>>>>>>> a
>>>>>>> name, and that's always just a single document, not a collection, as
>>>>>>> far as
>>>>>>> the template is concerned. (We can have multiple documents per run,
>>>>>>> but
>>>>>>> those normally yield separate output generators, so it's still only
>>>>>>> one
>>>>>>> document per template.) However, we can have data source types
>>>>>>> (document
>>>>>>> types with old terminology) that collect together multiple data files.
>>>>>>> So
>>>>>>> then that complexity is encapsulated into the data source type, and
>>>>>>> doesn't
>>>>>>> complicate the overall architecture. That's another case when a data
>>>>>>> source
>>>>>>> is not just a file. Like maybe there's a data source type that loads
>>>>>>> all
>>>>>>> the CSV-s from a directory, into a single big table (I had such case),
>>>>>>> or
>>>>>>> even into a list of tables. Or, as I mentioned already, a data source
>>>>>>> is
>>>>>>> maybe an SQL query on a JDBC data source (and we got the first term
>>>>>>> clash... JDBC also call them data sources).
>>>>>>> 
>>>>>>> Template and document mode probably shouldn't exist from user
>>>>>>> perspective
>>>>>>> either, at least not as a global option that must apply to everything
>>>>>>> in a
>>>>>>> run. They could just give the files that define the "output
>>>>>>> generators",
>>>>>>> and some of them will be templates, some of them are data files, in
>>>>>>> which
>>>>>>> case a template need to be associated with them (and there can be a
>>>>>>> couple
>>>>>>> of ways of doing that). And then again, there are the cases where you
>>>>>>> want
>>>>>>> to create one output generator per entity from some data source.
>>>>>>> 
>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>>> Hi Daniel,
>>>>>>>> 
>>>>>>>> See my comments below - and thanks for your patience and input :-)
>>>>>>>> 
>>>>>>>> *Renaming Document To DataSource*
>>>>>>>> 
>>>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
>>>>>>>> and
>>>>>>>> its DataSource.
>>>>>>>> 
>>>>>>>> *Template And Document Mode*
>>>>>>>> 
>>>>>>>> Agreed - I think it is a valuable abstraction for the user but it is
>>>>>>>> not
>>>>>>>> an implementation concept :-)
>>>>>>>> 
>>>>>>>> *Document Without Symbolic Names*
>>>>>>>> 
>>>>>>>> Also agreed and it is going to change but I have not settled my mind
>>>>>>>> yet
>>>>>>>> what exactly to implement.
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> Siegfried Goeschl
>>>>>>>> 
>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>>>>> 
>>>>>>>> A few quick thoughts on that:
>>>>>>>> 
>>>>>>>> - We should replace the "document" term with something more speaking.
>>>>>>>> It
>>>>>>>> doesn't tell that it's some kind of input. Also, most of these inputs
>>>>>>>> aren't something that people typically call documents. Like a csv
>>>>>>>> file, or
>>>>>>>> a database table, which is not even a file (OK we don't support such
>>>>>>>> thing
>>>>>>>> at the moment). I think, maybe "data source" is a safe enough term.
>>>>>>>> (It
>>>>>>>> also rhymes with data model.)
>>>>>>>> - You have separate "template" and "document" "mode", that applies to
>>>>>>>> a
>>>>>>>> whole run. I think such specialization won't be helpful. We could
>>>>>>>> just say,
>>>>>>>> on the conceptual level at lest, that we need a set of "outputs
>>>>>>>> generators". An output generator is an object (in the API) that
>>>>>>>> specifies a
>>>>>>>> template, a data-model (where the data-model is possibly populated
>>>>>>>> with
>>>>>>>> "documents"), and an output "sink" (a file path, or stdout), and can
>>>>>>>> generate the output itself. A practical way of defining the output
>>>>>>>> generators in a CLI application is via a bunch of files, each
>>>>>>>> defining an
>>>>>>>> output generator. Some of those files is maybe a template (that you
>>>>>>>> can
>>>>>>>> even detect from the file extension), or a data file that we
>>>>>>>> currently call
>>>>>>>> a "document". They could freely mix inside the same run. I have also
>>>>>>>> met
>>>>>>>> use case when you have a single table (single "document"), and each
>>>>>>>> record
>>>>>>>> in it yields an output file. That can also be described in some file
>>>>>>>> format, or really in any other way, like directly in command line
>>>>>>>> argument,
>>>>>>>> via API, etc.
>>>>>>>> - You have multiple documents without associated symbolical name in
>>>>>>>> some
>>>>>>>> examples. Templates can't identify those then in a well maintainable
>>>>>>>> way.
>>>>>>>> The actual file name is often not a good identifier, can change over
>>>>>>>> time,
>>>>>>>> and you might don't even have good control over it, like you already
>>>>>>>> receive it as a parameter from somewhere else, or someone
>>>>>>>> moves/renames
>>>>>>>> that files that you need to read. Index is also not very good, but I
>>>>>>>> have
>>>>>>>> written about that earlier.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>> Hi folks,
>>>>>>>> 
>>>>>>>> still wrapping my side around but assembled some thoughts here -
>>>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> Siegfried Goeschl
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> What you are describing is more like the angle that FMPP took
>>>>>>>> initially,
>>>>>>>> where templates drive things, they generate the output for themselves
>>>>>>>> 
>>>>>>>> (even
>>>>>>>> 
>>>>>>>> multiple output files if they wish). By default output files name
>>>>>>>> (and
>>>>>>>> relative path) is deduced from template name. There was also a global
>>>>>>>> data-model, built in a configuration file (or equally, built via
>>>>>>>> command
>>>>>>>> line arguments, or both mixed), from which templates get whatever
>>>>>>>> data
>>>>>>>> 
>>>>>>>> they
>>>>>>>> 
>>>>>>>> are interested in. Take a look at the figures here:
>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>>>>>>>> 
>>>>>>>> generalized
>>>>>>>> 
>>>>>>>> a bit more, because you could add XML files at the same place where
>>>>>>>> you
>>>>>>>> have the templates, and then you could associate transform templates
>>>>>>>> to
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> XML files (based on path pattern and/or the XML document element).
>>>>>>>> Now
>>>>>>>> that's like what freemarker-generator had initially (data files drive
>>>>>>>> output, and the template is there to transform it).
>>>>>>>> 
>>>>>>>> So I think the generic mental model would like this:
>>>>>>>> 
>>>>>>>> 1. You got files that drive the process, let's call them *generator
>>>>>>>> files* for now. Usually, each generator file yields an output file
>>>>>>>> (but
>>>>>>>> maybe even multiple output files, as you might saw in the last
>>>>>>>> figure).
>>>>>>>> These generator files can be of many types, like XML, JSON, XLSX (as
>>>>>>>> 
>>>>>>>> in the
>>>>>>>> 
>>>>>>>> original freemarker-generator), and even templates (as is the norm in
>>>>>>>> FMPP). If the file is not a template, then you got a set of
>>>>>>>> transformer
>>>>>>>> templates (-t CLI option) in a separate directory, which can be
>>>>>>>> 
>>>>>>>> associated
>>>>>>>> 
>>>>>>>> with the generator files base on name patterns, and even based on
>>>>>>>> 
>>>>>>>> content
>>>>>>>> 
>>>>>>>> (schema usually). If the generator file is a template (so that's a
>>>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
>>>>>>>> is
>>>>>>>> 
>>>>>>>> not
>>>>>>>> 
>>>>>>>> a template file specified after the "-t" option), then you just
>>>>>>>> Template.process(...) it, and it prints what the output will be.
>>>>>>>> 2. You also have a set of variables, the global data-model, that
>>>>>>>> contains commonly useful stuff, like what you now call parameters
>>>>>>>> (CLI
>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
>>>>>>>> 
>>>>>>>> data
>>>>>>>> 
>>>>>>>> files aren't "generator files". Templates just use them if they need
>>>>>>>> 
>>>>>>>> them.
>>>>>>>> 
>>>>>>>> An important thing here is to reuse the same mechanism to read and
>>>>>>>> 
>>>>>>>> parse
>>>>>>>> 
>>>>>>>> those data files, which was used in templates when transforming
>>>>>>>> 
>>>>>>>> generator
>>>>>>>> 
>>>>>>>> files. So we need a common format for specifying how to load data
>>>>>>>> 
>>>>>>>> files.
>>>>>>>> 
>>>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
>>>>>>>> declarative format.
>>>>>>>> 
>>>>>>>> What I have described in the original post here was a less generic
>>>>>>>> form
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> this, as I tried to be true with the original approach. I though the
>>>>>>>> proposal will be drastic enough as it is... :) There, the "main"
>>>>>>>> document
>>>>>>>> is the "generator file" from point 1, the "-t" template is the
>>>>>>>> transform
>>>>>>>> template for the "main" document, and the other named documents
>>>>>>>> ("users",
>>>>>>>> "groups") is a poor man's shared data-model from point 2 (together
>>>>>>>> with
>>>>>>>> with -PName=value).
>>>>>>>> 
>>>>>>>> There's further somewhat confusing thing to get right with the
>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
>>>>>>>> In
>>>>>>>> the model above, as per point 1, if you list multiple data files,
>>>>>>>> each
>>>>>>>> 
>>>>>>>> will
>>>>>>>> 
>>>>>>>> generate a separate output file. So, if you need take in a list of
>>>>>>>> files
>>>>>>>> 
>>>>>>>> to
>>>>>>>> 
>>>>>>>> transform it to a single output file (or at least with a single
>>>>>>>> transform
>>>>>>>> template execution), then you have to be explicit about that, as
>>>>>>>> that's
>>>>>>>> 
>>>>>>>> not
>>>>>>>> 
>>>>>>>> the default behavior anymore. But it's still absolutely possible.
>>>>>>>> Imagine
>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You need some
>>>>>>>> CLI
>>>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
>>>>>>>> be a
>>>>>>>> big deal.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>> Hi Daniel,
>>>>>>>> 
>>>>>>>> Good timing - I was looking at a similar problem from different angle
>>>>>>>> yesterday (see below)
>>>>>>>> 
>>>>>>>> Don't have enough time to answer your email in detail now - will do
>>>>>>>> that
>>>>>>>> tomorrow evening
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> Siegfried Goeschl
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ===. START
>>>>>>>> # FreeMarker CLI Improvement
>>>>>>>> ## Support Of Multiple Template Files
>>>>>>>> Currently we support the following combinations
>>>>>>>> 
>>>>>>>> * Single template and no data files
>>>>>>>> * Single template and one or more data files
>>>>>>>> 
>>>>>>>> But we can not support the following use case which is quite typical
>>>>>>>> in
>>>>>>>> the cloud
>>>>>>>> 
>>>>>>>> __Convert multiple templates with a single data file, e.g copying a
>>>>>>>> directory of configuration files using a JSON configuration file__
>>>>>>>> 
>>>>>>>> ## Implementation notes
>>>>>>>> * When we copy a directory we can remove the `ftl`extension on the
>>>>>>>> fly
>>>>>>>> * We might need an `exclude` filter for the copy operation
>>>>>>>> * Initially resolve to a list of template files and process one after
>>>>>>>> another
>>>>>>>> * Need to calculate the output file location and extension
>>>>>>>> * We need to rename the existing command line parameters (see below)
>>>>>>>> * Do we need multiple include and exclude filter?
>>>>>>>> * Do we need file versus directory filters?
>>>>>>>> 
>>>>>>>> ### Command Line Options
>>>>>>>> ```
>>>>>>>> --input-encoding : Encoding of the documents
>>>>>>>> --output-encoding : Encoding of the rendered template
>>>>>>>> --template-encoding : Encoding of the template
>>>>>>>> --output : Output file or directory
>>>>>>>> --include-document : Include pattern for documents
>>>>>>>> --exclude-document : Exclude pattern for documents
>>>>>>>> --include-template: Include pattern for templates
>>>>>>>> --exclude-template : Exclude pattern for templates
>>>>>>>> ```
>>>>>>>> 
>>>>>>>> ### Command Line Examples
>>>>>>>> ```text
>>>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
>>>>>>>> 
>>>>>>>> directory
>>>>>>>> 
>>>>>>>> using the data from "config.json"
>>>>>>>> 
>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>>>>>>>> 
>>>>>>>> config.json
>>>>>>>> 
>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>> 
>>>>>>>> --output
>>>>>>>> 
>>>>>>>> /config config.json
>>>>>>>> 
>>>>>>>> # Bascically the same using a named document "configuration"
>>>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
>>>>>>>> data
>>>>>>>> model
>>>>>>>> # It might make sens to allow URIs for loading documents
>>>>>>>> 
>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>>>>> 
>>>>>>>> configuration=config.json
>>>>>>>> 
>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>> 
>>>>>>>> --output
>>>>>>>> 
>>>>>>>> /config --document configuration=config.json
>>>>>>>> 
>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>> 
>>>>>>>> --output
>>>>>>>> 
>>>>>>>> /config --document configuration=file:///config.json
>>>>>>>> 
>>>>>>>> # Bascically the same using an environment variable as named document
>>>>>>>> 
>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>>>>>>>> 
>>>>>>>> configuration=env:///CONFIGURATION
>>>>>>>> 
>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>> 
>>>>>>>> --output
>>>>>>>> 
>>>>>>>> /config --document configuration=env:///CONFIGURATION
>>>>>>>> ```
>>>>>>>> === END
>>>>>>>> 
>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Input documents is a fundamental concept in freemarker-generator, so
>>>>>>>> we
>>>>>>>> should think about that more, and probably refine/rework how it's
>>>>>>>> done.
>>>>>>>> 
>>>>>>>> Currently it works like this, with CLI at least.
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> -t access-report.ftl
>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>> 
>>>>>>>> Then in access-report.ftl you have to do something like this:
>>>>>>>> 
>>>>>>>> <#assign doc = Documents.get(0)>
>>>>>>>> ... process doc here
>>>>>>>> 
>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>>>>>>>> 
>>>>>>>> funny
>>>>>>>> 
>>>>>>>> chain of coincidences: It returned the string "D", then
>>>>>>>> 
>>>>>>>> CSVTool.parse(...)
>>>>>>>> 
>>>>>>>> happily parsed that to a table with the single column "D", and 0
>>>>>>>> rows,
>>>>>>>> 
>>>>>>>> and
>>>>>>>> 
>>>>>>>> as there were 0 rows, the template didn't run into an error because
>>>>>>>> row.myExpectedColumn refers to a missing column either, so the
>>>>>>>> process
>>>>>>>> finished with success. (: Pretty unlucky for sure. The root was
>>>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
>>>>>>>> will
>>>>>>>> 
>>>>>>>> have
>>>>>>>> 
>>>>>>>> to work on those too, but, different topic.)
>>>>>>>> 
>>>>>>>> However, actually multiple input documents can be passed in:
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> -t access-report.ftl
>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>> 
>>>>>>>> Above template will still work, though then you ignored all but the
>>>>>>>> 
>>>>>>>> first
>>>>>>>> 
>>>>>>>> document. So if you expect any number of input documents, you
>>>>>>>> probably
>>>>>>>> 
>>>>>>>> will
>>>>>>>> 
>>>>>>>> have to do this:
>>>>>>>> 
>>>>>>>> <#list Documents.list as doc>
>>>>>>>> ... process doc here
>>>>>>>> </#list>
>>>>>>>> 
>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
>>>>>>>> 
>>>>>>>> those
>>>>>>>> 
>>>>>>>> we will work out in a different thread.)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So, what would be better, in my opinion. I start out from what I
>>>>>>>> think
>>>>>>>> 
>>>>>>>> are
>>>>>>>> 
>>>>>>>> the common uses cases, in decreasing order of frequency. Goal is to
>>>>>>>> 
>>>>>>>> make
>>>>>>>> 
>>>>>>>> those less error prone for the users, and simpler to express.
>>>>>>>> 
>>>>>>>> USE CASE 1
>>>>>>>> 
>>>>>>>> You have exactly 1 input documents, which is therefore simply "the"
>>>>>>>> document in the mind of the user. This is probably the typical use
>>>>>>>> 
>>>>>>>> case,
>>>>>>>> 
>>>>>>>> but at least the use case users typically start out from when
>>>>>>>> starting
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> work.
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> -t access-report.ftl
>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>> 
>>>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>>>>>>>> 
>>>>>>>> error
>>>>>>>> 
>>>>>>>> prone, because if the user passed in more than 1 documents (can even
>>>>>>>> 
>>>>>>>> happen
>>>>>>>> 
>>>>>>>> totally accidentally, like if the user was lazy and used a wildcard
>>>>>>>> 
>>>>>>>> that
>>>>>>>> 
>>>>>>>> the shell exploded), the template will silently ignore the rest of
>>>>>>>> the
>>>>>>>> documents, and the singe document processed will be practically
>>>>>>>> picked
>>>>>>>> randomly. The user might won't notice that and submits a bad report
>>>>>>>> or
>>>>>>>> 
>>>>>>>> such.
>>>>>>>> 
>>>>>>>> I think that in this use case the document should be simply referred
>>>>>>>> as
>>>>>>>> `Document` in the template. When you have multiple documents there,
>>>>>>>> referring to `Document` should be an error, saying that the template
>>>>>>>> 
>>>>>>>> was
>>>>>>>> 
>>>>>>>> made to process a single document only.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> USE CASE 2
>>>>>>>> 
>>>>>>>> You have multiple input documents, but each has different role
>>>>>>>> 
>>>>>>>> (different
>>>>>>>> 
>>>>>>>> schema, maybe different file type). Like, you pass in users.csv and
>>>>>>>> groups.csv. Each has difference schema, and so you want to access
>>>>>>>> them
>>>>>>>> differently, but in the same template.
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> [...]
>>>>>>>> --named-document users somewhere/foo-users.csv
>>>>>>>> --named-document groups somewhere/foo-groups.csv
>>>>>>>> 
>>>>>>>> Then in the template you could refer to them as:
>>>>>>>> 
>>>>>>>> `NamedDocuments.users`,
>>>>>>>> 
>>>>>>>> and `NamedDocuments.groups`.
>>>>>>>> 
>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
>>>>>>>> 
>>>>>>>> `Document`
>>>>>>>> 
>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>>>>>>>> 
>>>>>>>> because
>>>>>>>> 
>>>>>>>> that's "the" document the template is about, but then you have to
>>>>>>>> added
>>>>>>>> some helper documents, with symbolic names representing their role.
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> -t access-report.ftl
>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>>>>> 
>>>>>>>> Here, `Document` still works in the template, and it refers to
>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>>>>>>>> 
>>>>>>>> above
>>>>>>>> 
>>>>>>>> would be cleaner, I couldn't figure out how to do that with Picocli.
>>>>>>>> Anyway, for now the point is the concept, which is not specific to
>>>>>>>> 
>>>>>>>> CLI.)
>>>>>>>> 
>>>>>>>> USE CASE 3
>>>>>>>> 
>>>>>>>> Here you have several of the same kind of documents. That has a more
>>>>>>>> generic sub-use-case, when you have explicitly named documents (like
>>>>>>>> "users" above), and for some you expect multiple input files.
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> -t access-report.ftl
>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>> somewhere/bar-users.csv
>>>>>>>> --document-name=groups somewhere/global-groups.csv
>>>>>>>> 
>>>>>>>> The template must to be written with this use case in mind, as now it
>>>>>>>> 
>>>>>>>> has
>>>>>>>> 
>>>>>>>> #list some of the documents. (I think in practice you hardly ever
>>>>>>>> want
>>>>>>>> 
>>>>>>>> to
>>>>>>>> 
>>>>>>>> get a document by hard coded index. Either you don't know how many
>>>>>>>> documents you have, so you can't use hard coded indexes, or you do,
>>>>>>>> and
>>>>>>>> each index has a specific meaning, but then you should name the
>>>>>>>> 
>>>>>>>> documents
>>>>>>>> 
>>>>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>>>>> Accessing that list of documents in the template, maybe could be done
>>>>>>>> 
>>>>>>>> like
>>>>>>>> 
>>>>>>>> this:
>>>>>>>> - For the "main" documents: `DocumentList`
>>>>>>>> - For explicitly named documents, like "users":
>>>>>>>> 
>>>>>>>> `NamedDocumentLists.users`
>>>>>>>> 
>>>>>>>> SUMMING UP
>>>>>>>> 
>>>>>>>> To unify all 3 use cases into a coherent concept:
>>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you
>>>>>>>> 
>>>>>>>> can
>>>>>>>> 
>>>>>>>> achieve everything with it, using it requires your template to handle
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> most generic case too. So, I think it would be rarely used.
>>>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>>>>>>>> 
>>>>>>>> It's
>>>>>>>> 
>>>>>>>> used if you only have one kind of documents (single format and
>>>>>>>> schema),
>>>>>>>> 
>>>>>>>> but
>>>>>>>> 
>>>>>>>> potentially multiple of them.
>>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
>>>>>>>> document
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> the given name.
>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
>>>>>>>> for
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> most natural/frequent use case.
>>>>>>>> 
>>>>>>>> That's 4 possible ways of accessing your documents, which is a
>>>>>>>> 
>>>>>>>> trade-off
>>>>>>>> 
>>>>>>>> for the sake of these:
>>>>>>>> - Catching CLI (or Maven, etc.) input where the template output
>>>>>>>> likely
>>>>>>>> 
>>>>>>>> will
>>>>>>>> 
>>>>>>>> be wrong. That's only possible if the user can communicate its intent
>>>>>>>> 
>>>>>>>> in
>>>>>>>> 
>>>>>>>> the template.
>>>>>>>> - Users don't need to deal with concepts that are irrelevant in their
>>>>>>>> concrete use case. Just start with the trivial, `Document`, and later
>>>>>>>> 
>>>>>>>> if
>>>>>>>> 
>>>>>>>> the need arises, generalize to named documents, document lists, or
>>>>>>>> 
>>>>>>>> both.
>>>>>>>> 
>>>>>>>> What do guys think?
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Daniel Dekany
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Daniel Dekany
>>> 
>>> 
>> 
>> -- 
>> Best regards,
>> Daniel Dekany
>
Re: freemarker-generator: Improving the input documents concept

Reply via email to