Re: freemarker-generator: Improving the input documents concept

Siegfried Goeschl Sat, 29 Feb 2020 10:34:45 -0800

HI Daniel,

Seem my comments below


Thanks in advance, 

Siegfried Goeschl


> On 29.02.2020, at 19:08, Daniel Dekany <[email protected]> wrote:
> 
> FREEMARKER-135 freemarker-generator-cli: Support user-supplied names for
> datasources
> 
> So, I can do this to have both a name an a group associated to a data
> source:
>  --datasource someName:someGroup=somewhere/something

Correct

> Or if I only want a name, but not a group (or an ""  group actually -
> bug?), then:
>  --datasource someName=somewhere/something

Correct

> 
> Or if only a group but not a name (or a "" name actually) then:
>  --datasource :someGroup=somewhere/something

Mhmm, that would be unintended functionality from my side - current approach is 
that every "Document" / "Datasource / DataSource" is named

> 
> A name must identify exactly 1 data source, while a group identifies a list
> of data sources.

No, every "Document" / "Datasource / DataSource" has a name currently but 
uniqueness is not enforced. Only if you want to get a "Document" / "Datasource 
/ DataSource" with it's exact name I checked for exactly one search hit and 
throw an exception. I try to provide a useful name even when the content is 
coming from an URL or STDIN (and I will probably add environment variables as 
"Document" / "Datasource / DataSource", e.g configuration in the cloud as JSON 
content passed as environment variable)

> 
> Is that this idea, that the a data source can be part of a group, and then
> is also possibly identifiable with a name comes from an use case? I mean,
> it's possibly important somewhere, but if so, then it's strange that you
> can put something into only a single group. If we need this kind of thing,
> then perhaps you should be just allowed to associate the data source with a
> list of names (kind of like tagging), and then when the template wants to
> get something by name, it will tell there if it expects exactly one or a
> list of data sources. Then you don't need to introduce two terms in the
> documentation either (names and groups). Again, if we want this at all,
> instead of just going with a data source that itself gives a list. (And if
> not, how will we handle a data source that loads from a non-file source?)

I actually thought of implementing tagging but considered a "group" sufficient.

* If you don't define anything everything goes into the "default" group
* For individual documents you can define a name and an optional group

I think we have a different understanding what a "Document" / "Datasource / 
DataSource" should do

* It is a dumb 
* It is lazy since data is only loaded on demand
* There is no automagic like "oh, this is a JSON file, so let's go to the JSON 
tool and create a map readily accessible in the data model"

> 
> Note that the current command line syntax doesn't work well with shell
> wildcard expansion. Like this:
> --datasource :someGroup=logs/*.log
> will try to expand ":someGroup=logs/*.log", and because it finds nothing
> (and because the rules of sh and the like is a mess), you will get the
> parameter value as is, without * expanded.

The joy of programming - I did not intend to use "name:group" together with 
wildcards :-)

> 
> Also,  I think the syntax with colon should be flipped, because on other
> places foo:bar usually means that foo is the bigger unit (the container),
> and bar is the smaller unit (the child).

I Disagree here - I think using a name would be used more often. I added the 
"group" as an afterthought since some grouping could be useful

> 
> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
> [email protected]> wrote:
> 
>> Hi Daniel,
>> 
>> I'm an enterprise developer - bad habits die hard :-)
>> 
>> So I closed the following tickets and merged the branches
>> 
>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
>> "freemarker-generator"
>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource"
>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
>> for datasources
>> 
>> Thanks in advance,
>> 
>> Siegfried Goeschl
>> 
>> 
>>> On 29.02.2020, at 12:19, Daniel Dekany <[email protected]> wrote:
>>> 
>>> Yeah, and of course, you can merge that branch. You can even work on the
>>> master directly after all.
>>> 
>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <[email protected]>
>>> wrote:
>>> 
>>>> But, I do recognize the cattle use case (several "faceless" files with
>>>> common format/schema). Only, my idea is to push that complexity on the
>> data
>>>> source. The "data source" concept shields the rest of the application
>> from
>>>> the details of how the data is stored or retrieved. So, a data source
>> might
>>>> loads a bunch of log files from a directory, and present them as a
>> single
>>>> big table, or like a list of tables, etc. So I want to deal with the
>> cattle
>>>> use case, but the question is what part of the of architecture will deal
>>>> with this complication, with other words, how do you box things. Why my
>>>> initial bet is to stuff that complication into the "data source"
>>>> implementation(s) is that data sources are inherently varied. Some
>> returns
>>>> a table-like thing, some have multiple named tables (worksheets in
>> Excel),
>>>> some returns tree of nodes (XML), etc. So then, some might returns a
>>>> list-of-list-of log records, or just a single list of log-records (put
>>>> together from daily log files). That way cattles don't add to conceptual
>>>> complexity. Now, you might be aware of cases where the cattle concept
>> must
>>>> be more exposed than this, and the we can't box things like this. But
>> this
>>>> is what I tried to express.
>>>> 
>>>> Regarding "output generators", and how that applies on the command
>> line. I
>>>> think it's important that the common core between Maven and
>> command-line is
>>>> as fat as possible. Ideally, they are just two syntax to set up the same
>>>> thing. Mostly at least. So, if you specify a template file to the CLI
>>>> application, in a way so that it causes it to process that template to
>>>> generate a single output, then there you have just defined an "output
>>>> generator" (even if it wasn't explicitly called like that in the command
>>>> line). If you specify 3 csv files to the CLI application, in a way so
>> that
>>>> it causes it to generate 3 output files, then you have just defined 3
>>>> "output generators" there (there's at least one template specified there
>>>> too, but that wasn't an "output generator" itself, it was just an
>> attribute
>>>> of the 3 output generators). If you specify 1 template, and 3 csv
>> files, in
>>>> a way so that it will yield 4 output files (1 for the template, 3 for
>> the
>>>> csv-s), then you have defined 4 output generators there. If you have a
>> data
>>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
>> list of
>>>> tables then), and you have 2 templates, and you tell the CLI to execute
>>>> each template for each item in said data source, then you have just
>> defined
>>>> 6 "output generators".
>>>> 
>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>>>> [email protected]> wrote:
>>>> 
>>>>> Hi Daniel,
>>>>> 
>>>>> That all depends on your mental model and work you do, expectations,
>>>>> experience :-)
>>>>> 
>>>>> 
>>>>> __Document Handling__
>>>>> 
>>>>> *"But I think actually we have no good use case for list of documents
>>>>> that's passed at once to a single template run, so, we can just ignore
>>>>> that complication"*
>>>>> 
>>>>> In my case that's not a complication but my daily business - I'm
>>>>> regularly wading through access logs - yesterday probably a couple of
>>>>> hundreds access logs across two staging sites to help tracking some
>>>>> strange API gateway issues :-)
>>>>> 
>>>>> My gut feeling is (borrowing from
>>>>> 
>>>>> 
>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>>>> )
>>>>> 
>>>>> 1. You have a few lovely named documents / templates - `pets`
>>>>> 2. You have tons of anonymous documents / templates to process -
>>>>> `cattle`
>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>>>> 
>>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
>>>>> it is equally important and common.
>>>>> 
>>>>> 
>>>>> __Template And Document Processing Modes__
>>>>> 
>>>>> IMHO it is important to answer the following question : "How many
>>>>> outputs do you get when rendering 2 template and 3 datasources? Two,
>>>>> Three or Six?"
>>>>> 
>>>>> Your answer is influenced by your mental model / experience
>>>>> 
>>>>> * When wading through tons of CSV files, access logs, etc. the answer
>> is
>>>>> "2"
>>>>> * When doing source code generation the obvious answer is "6"
>>>>> * Can't image a use case which results in "3" but I'm pretty sure we
>>>>> will encounter one
>>>>> 
>>>>> __Template and document mode probably shouldn't exist__
>>>>> 
>>>>> That's hard for me to fully understand - I definitely lack your
>> insights
>>>>> & experience writing such tools :-)
>>>>> 
>>>>> Defining the `Output Generator` is the underlying model for the Maven
>>>>> plugin (and probably FMPP).
>>>>> 
>>>>> I'm not sure if this applies for command lines at least not in the way
>> I
>>>>> use them (or would like to use them)
>>>>> 
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Siegfried Goeschl
>>>>> 
>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>>>> 
>>>>> 
>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>>>> 
>>>>>> Yeah, "data source" is surely a too popular name, but for reason.
>>>>>> Anyone
>>>>>> has other ideas?
>>>>>> 
>>>>>> As of naming data sources and such. One thing I was wondering about
>>>>>> back
>>>>>> then is how to deal with list of documents given to a template, versus
>>>>>> exactly 1 document given to a template. But I think actually we have
>>>>>> no
>>>>>> good use case for list of documents that's passed at once to a single
>>>>>> template run, so, we can just ignore that complication. A document has
>>>>>> a
>>>>>> name, and that's always just a single document, not a collection, as
>>>>>> far as
>>>>>> the template is concerned. (We can have multiple documents per run,
>>>>>> but
>>>>>> those normally yield separate output generators, so it's still only
>>>>>> one
>>>>>> document per template.) However, we can have data source types
>>>>>> (document
>>>>>> types with old terminology) that collect together multiple data files.
>>>>>> So
>>>>>> then that complexity is encapsulated into the data source type, and
>>>>>> doesn't
>>>>>> complicate the overall architecture. That's another case when a data
>>>>>> source
>>>>>> is not just a file. Like maybe there's a data source type that loads
>>>>>> all
>>>>>> the CSV-s from a directory, into a single big table (I had such case),
>>>>>> or
>>>>>> even into a list of tables. Or, as I mentioned already, a data source
>>>>>> is
>>>>>> maybe an SQL query on a JDBC data source (and we got the first term
>>>>>> clash... JDBC also call them data sources).
>>>>>> 
>>>>>> Template and document mode probably shouldn't exist from user
>>>>>> perspective
>>>>>> either, at least not as a global option that must apply to everything
>>>>>> in a
>>>>>> run. They could just give the files that define the "output
>>>>>> generators",
>>>>>> and some of them will be templates, some of them are data files, in
>>>>>> which
>>>>>> case a template need to be associated with them (and there can be a
>>>>>> couple
>>>>>> of ways of doing that). And then again, there are the cases where you
>>>>>> want
>>>>>> to create one output generator per entity from some data source.
>>>>>> 
>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>> Hi Daniel,
>>>>>>> 
>>>>>>> See my comments below - and thanks for your patience and input :-)
>>>>>>> 
>>>>>>> *Renaming Document To DataSource*
>>>>>>> 
>>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
>>>>>>> and
>>>>>>> its DataSource.
>>>>>>> 
>>>>>>> *Template And Document Mode*
>>>>>>> 
>>>>>>> Agreed - I think it is a valuable abstraction for the user but it is
>>>>>>> not
>>>>>>> an implementation concept :-)
>>>>>>> 
>>>>>>> *Document Without Symbolic Names*
>>>>>>> 
>>>>>>> Also agreed and it is going to change but I have not settled my mind
>>>>>>> yet
>>>>>>> what exactly to implement.
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Siegfried Goeschl
>>>>>>> 
>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>>>> 
>>>>>>> A few quick thoughts on that:
>>>>>>> 
>>>>>>> - We should replace the "document" term with something more speaking.
>>>>>>> It
>>>>>>> doesn't tell that it's some kind of input. Also, most of these inputs
>>>>>>> aren't something that people typically call documents. Like a csv
>>>>>>> file, or
>>>>>>> a database table, which is not even a file (OK we don't support such
>>>>>>> thing
>>>>>>> at the moment). I think, maybe "data source" is a safe enough term.
>>>>>>> (It
>>>>>>> also rhymes with data model.)
>>>>>>> - You have separate "template" and "document" "mode", that applies to
>>>>>>> a
>>>>>>> whole run. I think such specialization won't be helpful. We could
>>>>>>> just say,
>>>>>>> on the conceptual level at lest, that we need a set of "outputs
>>>>>>> generators". An output generator is an object (in the API) that
>>>>>>> specifies a
>>>>>>> template, a data-model (where the data-model is possibly populated
>>>>>>> with
>>>>>>> "documents"), and an output "sink" (a file path, or stdout), and can
>>>>>>> generate the output itself. A practical way of defining the output
>>>>>>> generators in a CLI application is via a bunch of files, each
>>>>>>> defining an
>>>>>>> output generator. Some of those files is maybe a template (that you
>>>>>>> can
>>>>>>> even detect from the file extension), or a data file that we
>>>>>>> currently call
>>>>>>> a "document". They could freely mix inside the same run. I have also
>>>>>>> met
>>>>>>> use case when you have a single table (single "document"), and each
>>>>>>> record
>>>>>>> in it yields an output file. That can also be described in some file
>>>>>>> format, or really in any other way, like directly in command line
>>>>>>> argument,
>>>>>>> via API, etc.
>>>>>>> - You have multiple documents without associated symbolical name in
>>>>>>> some
>>>>>>> examples. Templates can't identify those then in a well maintainable
>>>>>>> way.
>>>>>>> The actual file name is often not a good identifier, can change over
>>>>>>> time,
>>>>>>> and you might don't even have good control over it, like you already
>>>>>>> receive it as a parameter from somewhere else, or someone
>>>>>>> moves/renames
>>>>>>> that files that you need to read. Index is also not very good, but I
>>>>>>> have
>>>>>>> written about that earlier.
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>> Hi folks,
>>>>>>> 
>>>>>>> still wrapping my side around but assembled some thoughts here -
>>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Siegfried Goeschl
>>>>>>> 
>>>>>>> 
>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]> wrote:
>>>>>>> 
>>>>>>> What you are describing is more like the angle that FMPP took
>>>>>>> initially,
>>>>>>> where templates drive things, they generate the output for themselves
>>>>>>> 
>>>>>>> (even
>>>>>>> 
>>>>>>> multiple output files if they wish). By default output files name
>>>>>>> (and
>>>>>>> relative path) is deduced from template name. There was also a global
>>>>>>> data-model, built in a configuration file (or equally, built via
>>>>>>> command
>>>>>>> line arguments, or both mixed), from which templates get whatever
>>>>>>> data
>>>>>>> 
>>>>>>> they
>>>>>>> 
>>>>>>> are interested in. Take a look at the figures here:
>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>>>>>>> 
>>>>>>> generalized
>>>>>>> 
>>>>>>> a bit more, because you could add XML files at the same place where
>>>>>>> you
>>>>>>> have the templates, and then you could associate transform templates
>>>>>>> to
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> XML files (based on path pattern and/or the XML document element).
>>>>>>> Now
>>>>>>> that's like what freemarker-generator had initially (data files drive
>>>>>>> output, and the template is there to transform it).
>>>>>>> 
>>>>>>> So I think the generic mental model would like this:
>>>>>>> 
>>>>>>> 1. You got files that drive the process, let's call them *generator
>>>>>>> files* for now. Usually, each generator file yields an output file
>>>>>>> (but
>>>>>>> maybe even multiple output files, as you might saw in the last
>>>>>>> figure).
>>>>>>> These generator files can be of many types, like XML, JSON, XLSX (as
>>>>>>> 
>>>>>>> in the
>>>>>>> 
>>>>>>> original freemarker-generator), and even templates (as is the norm in
>>>>>>> FMPP). If the file is not a template, then you got a set of
>>>>>>> transformer
>>>>>>> templates (-t CLI option) in a separate directory, which can be
>>>>>>> 
>>>>>>> associated
>>>>>>> 
>>>>>>> with the generator files base on name patterns, and even based on
>>>>>>> 
>>>>>>> content
>>>>>>> 
>>>>>>> (schema usually). If the generator file is a template (so that's a
>>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
>>>>>>> is
>>>>>>> 
>>>>>>> not
>>>>>>> 
>>>>>>> a template file specified after the "-t" option), then you just
>>>>>>> Template.process(...) it, and it prints what the output will be.
>>>>>>> 2. You also have a set of variables, the global data-model, that
>>>>>>> contains commonly useful stuff, like what you now call parameters
>>>>>>> (CLI
>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
>>>>>>> 
>>>>>>> data
>>>>>>> 
>>>>>>> files aren't "generator files". Templates just use them if they need
>>>>>>> 
>>>>>>> them.
>>>>>>> 
>>>>>>> An important thing here is to reuse the same mechanism to read and
>>>>>>> 
>>>>>>> parse
>>>>>>> 
>>>>>>> those data files, which was used in templates when transforming
>>>>>>> 
>>>>>>> generator
>>>>>>> 
>>>>>>> files. So we need a common format for specifying how to load data
>>>>>>> 
>>>>>>> files.
>>>>>>> 
>>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
>>>>>>> declarative format.
>>>>>>> 
>>>>>>> What I have described in the original post here was a less generic
>>>>>>> form
>>>>>>> 
>>>>>>> of
>>>>>>> 
>>>>>>> this, as I tried to be true with the original approach. I though the
>>>>>>> proposal will be drastic enough as it is... :) There, the "main"
>>>>>>> document
>>>>>>> is the "generator file" from point 1, the "-t" template is the
>>>>>>> transform
>>>>>>> template for the "main" document, and the other named documents
>>>>>>> ("users",
>>>>>>> "groups") is a poor man's shared data-model from point 2 (together
>>>>>>> with
>>>>>>> with -PName=value).
>>>>>>> 
>>>>>>> There's further somewhat confusing thing to get right with the
>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
>>>>>>> In
>>>>>>> the model above, as per point 1, if you list multiple data files,
>>>>>>> each
>>>>>>> 
>>>>>>> will
>>>>>>> 
>>>>>>> generate a separate output file. So, if you need take in a list of
>>>>>>> files
>>>>>>> 
>>>>>>> to
>>>>>>> 
>>>>>>> transform it to a single output file (or at least with a single
>>>>>>> transform
>>>>>>> template execution), then you have to be explicit about that, as
>>>>>>> that's
>>>>>>> 
>>>>>>> not
>>>>>>> 
>>>>>>> the default behavior anymore. But it's still absolutely possible.
>>>>>>> Imagine
>>>>>>> it as a "list of XLSX-es" is itself like a file format. You need some
>>>>>>> CLI
>>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
>>>>>>> be a
>>>>>>> big deal.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>> Hi Daniel,
>>>>>>> 
>>>>>>> Good timing - I was looking at a similar problem from different angle
>>>>>>> yesterday (see below)
>>>>>>> 
>>>>>>> Don't have enough time to answer your email in detail now - will do
>>>>>>> that
>>>>>>> tomorrow evening
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Siegfried Goeschl
>>>>>>> 
>>>>>>> 
>>>>>>> ===. START
>>>>>>> # FreeMarker CLI Improvement
>>>>>>> ## Support Of Multiple Template Files
>>>>>>> Currently we support the following combinations
>>>>>>> 
>>>>>>> * Single template and no data files
>>>>>>> * Single template and one or more data files
>>>>>>> 
>>>>>>> But we can not support the following use case which is quite typical
>>>>>>> in
>>>>>>> the cloud
>>>>>>> 
>>>>>>> __Convert multiple templates with a single data file, e.g copying a
>>>>>>> directory of configuration files using a JSON configuration file__
>>>>>>> 
>>>>>>> ## Implementation notes
>>>>>>> * When we copy a directory we can remove the `ftl`extension on the
>>>>>>> fly
>>>>>>> * We might need an `exclude` filter for the copy operation
>>>>>>> * Initially resolve to a list of template files and process one after
>>>>>>> another
>>>>>>> * Need to calculate the output file location and extension
>>>>>>> * We need to rename the existing command line parameters (see below)
>>>>>>> * Do we need multiple include and exclude filter?
>>>>>>> * Do we need file versus directory filters?
>>>>>>> 
>>>>>>> ### Command Line Options
>>>>>>> ```
>>>>>>> --input-encoding : Encoding of the documents
>>>>>>> --output-encoding : Encoding of the rendered template
>>>>>>> --template-encoding : Encoding of the template
>>>>>>> --output : Output file or directory
>>>>>>> --include-document : Include pattern for documents
>>>>>>> --exclude-document : Exclude pattern for documents
>>>>>>> --include-template: Include pattern for templates
>>>>>>> --exclude-template : Exclude pattern for templates
>>>>>>> ```
>>>>>>> 
>>>>>>> ### Command Line Examples
>>>>>>> ```text
>>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
>>>>>>> 
>>>>>>> directory
>>>>>>> 
>>>>>>> using the data from "config.json"
>>>>>>> 
>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>>>>>>> 
>>>>>>> config.json
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config config.json
>>>>>>> 
>>>>>>> # Bascically the same using a named document "configuration"
>>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
>>>>>>> data
>>>>>>> model
>>>>>>> # It might make sens to allow URIs for loading documents
>>>>>>> 
>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>>>> 
>>>>>>> configuration=config.json
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config --document configuration=config.json
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config --document configuration=file:///config.json
>>>>>>> 
>>>>>>> # Bascically the same using an environment variable as named document
>>>>>>> 
>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>>>>>>> 
>>>>>>> configuration=env:///CONFIGURATION
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config --document configuration=env:///CONFIGURATION
>>>>>>> ```
>>>>>>> === END
>>>>>>> 
>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote:
>>>>>>> 
>>>>>>> Input documents is a fundamental concept in freemarker-generator, so
>>>>>>> we
>>>>>>> should think about that more, and probably refine/rework how it's
>>>>>>> done.
>>>>>>> 
>>>>>>> Currently it works like this, with CLI at least.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> somewhere/foo-access-log.csv
>>>>>>> 
>>>>>>> Then in access-report.ftl you have to do something like this:
>>>>>>> 
>>>>>>> <#assign doc = Documents.get(0)>
>>>>>>> ... process doc here
>>>>>>> 
>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>>>>>>> 
>>>>>>> funny
>>>>>>> 
>>>>>>> chain of coincidences: It returned the string "D", then
>>>>>>> 
>>>>>>> CSVTool.parse(...)
>>>>>>> 
>>>>>>> happily parsed that to a table with the single column "D", and 0
>>>>>>> rows,
>>>>>>> 
>>>>>>> and
>>>>>>> 
>>>>>>> as there were 0 rows, the template didn't run into an error because
>>>>>>> row.myExpectedColumn refers to a missing column either, so the
>>>>>>> process
>>>>>>> finished with success. (: Pretty unlucky for sure. The root was
>>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
>>>>>>> will
>>>>>>> 
>>>>>>> have
>>>>>>> 
>>>>>>> to work on those too, but, different topic.)
>>>>>>> 
>>>>>>> However, actually multiple input documents can be passed in:
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> somewhere/foo-access-log.csv
>>>>>>> somewhere/bar-access-log.csv
>>>>>>> 
>>>>>>> Above template will still work, though then you ignored all but the
>>>>>>> 
>>>>>>> first
>>>>>>> 
>>>>>>> document. So if you expect any number of input documents, you
>>>>>>> probably
>>>>>>> 
>>>>>>> will
>>>>>>> 
>>>>>>> have to do this:
>>>>>>> 
>>>>>>> <#list Documents.list as doc>
>>>>>>> ... process doc here
>>>>>>> </#list>
>>>>>>> 
>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
>>>>>>> 
>>>>>>> those
>>>>>>> 
>>>>>>> we will work out in a different thread.)
>>>>>>> 
>>>>>>> 
>>>>>>> So, what would be better, in my opinion. I start out from what I
>>>>>>> think
>>>>>>> 
>>>>>>> are
>>>>>>> 
>>>>>>> the common uses cases, in decreasing order of frequency. Goal is to
>>>>>>> 
>>>>>>> make
>>>>>>> 
>>>>>>> those less error prone for the users, and simpler to express.
>>>>>>> 
>>>>>>> USE CASE 1
>>>>>>> 
>>>>>>> You have exactly 1 input documents, which is therefore simply "the"
>>>>>>> document in the mind of the user. This is probably the typical use
>>>>>>> 
>>>>>>> case,
>>>>>>> 
>>>>>>> but at least the use case users typically start out from when
>>>>>>> starting
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> work.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> somewhere/foo-access-log.csv
>>>>>>> 
>>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>>>>>>> 
>>>>>>> error
>>>>>>> 
>>>>>>> prone, because if the user passed in more than 1 documents (can even
>>>>>>> 
>>>>>>> happen
>>>>>>> 
>>>>>>> totally accidentally, like if the user was lazy and used a wildcard
>>>>>>> 
>>>>>>> that
>>>>>>> 
>>>>>>> the shell exploded), the template will silently ignore the rest of
>>>>>>> the
>>>>>>> documents, and the singe document processed will be practically
>>>>>>> picked
>>>>>>> randomly. The user might won't notice that and submits a bad report
>>>>>>> or
>>>>>>> 
>>>>>>> such.
>>>>>>> 
>>>>>>> I think that in this use case the document should be simply referred
>>>>>>> as
>>>>>>> `Document` in the template. When you have multiple documents there,
>>>>>>> referring to `Document` should be an error, saying that the template
>>>>>>> 
>>>>>>> was
>>>>>>> 
>>>>>>> made to process a single document only.
>>>>>>> 
>>>>>>> 
>>>>>>> USE CASE 2
>>>>>>> 
>>>>>>> You have multiple input documents, but each has different role
>>>>>>> 
>>>>>>> (different
>>>>>>> 
>>>>>>> schema, maybe different file type). Like, you pass in users.csv and
>>>>>>> groups.csv. Each has difference schema, and so you want to access
>>>>>>> them
>>>>>>> differently, but in the same template.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> [...]
>>>>>>> --named-document users somewhere/foo-users.csv
>>>>>>> --named-document groups somewhere/foo-groups.csv
>>>>>>> 
>>>>>>> Then in the template you could refer to them as:
>>>>>>> 
>>>>>>> `NamedDocuments.users`,
>>>>>>> 
>>>>>>> and `NamedDocuments.groups`.
>>>>>>> 
>>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
>>>>>>> 
>>>>>>> `Document`
>>>>>>> 
>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>>>>>>> 
>>>>>>> because
>>>>>>> 
>>>>>>> that's "the" document the template is about, but then you have to
>>>>>>> added
>>>>>>> some helper documents, with symbolic names representing their role.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>>>> 
>>>>>>> Here, `Document` still works in the template, and it refers to
>>>>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>>>>>>> 
>>>>>>> above
>>>>>>> 
>>>>>>> would be cleaner, I couldn't figure out how to do that with Picocli.
>>>>>>> Anyway, for now the point is the concept, which is not specific to
>>>>>>> 
>>>>>>> CLI.)
>>>>>>> 
>>>>>>> USE CASE 3
>>>>>>> 
>>>>>>> Here you have several of the same kind of documents. That has a more
>>>>>>> generic sub-use-case, when you have explicitly named documents (like
>>>>>>> "users" above), and for some you expect multiple input files.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>> somewhere/bar-access-log.csv
>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>> somewhere/bar-users.csv
>>>>>>> --document-name=groups somewhere/global-groups.csv
>>>>>>> 
>>>>>>> The template must to be written with this use case in mind, as now it
>>>>>>> 
>>>>>>> has
>>>>>>> 
>>>>>>> #list some of the documents. (I think in practice you hardly ever
>>>>>>> want
>>>>>>> 
>>>>>>> to
>>>>>>> 
>>>>>>> get a document by hard coded index. Either you don't know how many
>>>>>>> documents you have, so you can't use hard coded indexes, or you do,
>>>>>>> and
>>>>>>> each index has a specific meaning, but then you should name the
>>>>>>> 
>>>>>>> documents
>>>>>>> 
>>>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>>>> Accessing that list of documents in the template, maybe could be done
>>>>>>> 
>>>>>>> like
>>>>>>> 
>>>>>>> this:
>>>>>>> - For the "main" documents: `DocumentList`
>>>>>>> - For explicitly named documents, like "users":
>>>>>>> 
>>>>>>> `NamedDocumentLists.users`
>>>>>>> 
>>>>>>> SUMMING UP
>>>>>>> 
>>>>>>> To unify all 3 use cases into a coherent concept:
>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you
>>>>>>> 
>>>>>>> can
>>>>>>> 
>>>>>>> achieve everything with it, using it requires your template to handle
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> most generic case too. So, I think it would be rarely used.
>>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>>>>>>> 
>>>>>>> It's
>>>>>>> 
>>>>>>> used if you only have one kind of documents (single format and
>>>>>>> schema),
>>>>>>> 
>>>>>>> but
>>>>>>> 
>>>>>>> potentially multiple of them.
>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
>>>>>>> document
>>>>>>> 
>>>>>>> of
>>>>>>> 
>>>>>>> the given name.
>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
>>>>>>> for
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> most natural/frequent use case.
>>>>>>> 
>>>>>>> That's 4 possible ways of accessing your documents, which is a
>>>>>>> 
>>>>>>> trade-off
>>>>>>> 
>>>>>>> for the sake of these:
>>>>>>> - Catching CLI (or Maven, etc.) input where the template output
>>>>>>> likely
>>>>>>> 
>>>>>>> will
>>>>>>> 
>>>>>>> be wrong. That's only possible if the user can communicate its intent
>>>>>>> 
>>>>>>> in
>>>>>>> 
>>>>>>> the template.
>>>>>>> - Users don't need to deal with concepts that are irrelevant in their
>>>>>>> concrete use case. Just start with the trivial, `Document`, and later
>>>>>>> 
>>>>>>> if
>>>>>>> 
>>>>>>> the need arises, generalize to named documents, document lists, or
>>>>>>> 
>>>>>>> both.
>>>>>>> 
>>>>>>> What do guys think?
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Daniel Dekany
>>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Daniel Dekany
>> 
>> 
> 
> -- 
> Best regards,
> Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Reply via email to