Re: freemarker-generator: Improving the input documents concept

Daniel Dekany Sat, 29 Feb 2020 09:04:46 -0800

I believe that should be DataSource (with capital S), as it's two words.

Also, it's the name of a too widely used and known JDBC interface. So if
anyone can tell a similarly descriptive alternative...


On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
[email protected]> wrote:

> Hi Daniel,
>
> I'm an enterprise developer - bad habits die hard :-)
>
> So I closed the following tickets and merged the branches
>
> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
> "freemarker-generator"
> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource"
> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
> for datasources
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> > On 29.02.2020, at 12:19, Daniel Dekany <[email protected]> wrote:
> >
> > Yeah, and of course, you can merge that branch. You can even work on the
> > master directly after all.
> >
> > On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <[email protected]>
> > wrote:
> >
> >> But, I do recognize the cattle use case (several "faceless" files with
> >> common format/schema). Only, my idea is to push that complexity on the
> data
> >> source. The "data source" concept shields the rest of the application
> from
> >> the details of how the data is stored or retrieved. So, a data source
> might
> >> loads a bunch of log files from a directory, and present them as a
> single
> >> big table, or like a list of tables, etc. So I want to deal with the
> cattle
> >> use case, but the question is what part of the of architecture will deal
> >> with this complication, with other words, how do you box things. Why my
> >> initial bet is to stuff that complication into the "data source"
> >> implementation(s) is that data sources are inherently varied. Some
> returns
> >> a table-like thing, some have multiple named tables (worksheets in
> Excel),
> >> some returns tree of nodes (XML), etc. So then, some might returns a
> >> list-of-list-of log records, or just a single list of log-records (put
> >> together from daily log files). That way cattles don't add to conceptual
> >> complexity. Now, you might be aware of cases where the cattle concept
> must
> >> be more exposed than this, and the we can't box things like this. But
> this
> >> is what I tried to express.
> >>
> >> Regarding "output generators", and how that applies on the command
> line. I
> >> think it's important that the common core between Maven and
> command-line is
> >> as fat as possible. Ideally, they are just two syntax to set up the same
> >> thing. Mostly at least. So, if you specify a template file to the CLI
> >> application, in a way so that it causes it to process that template to
> >> generate a single output, then there you have just defined an "output
> >> generator" (even if it wasn't explicitly called like that in the command
> >> line). If you specify 3 csv files to the CLI application, in a way so
> that
> >> it causes it to generate 3 output files, then you have just defined 3
> >> "output generators" there (there's at least one template specified there
> >> too, but that wasn't an "output generator" itself, it was just an
> attribute
> >> of the 3 output generators). If you specify 1 template, and 3 csv
> files, in
> >> a way so that it will yield 4 output files (1 for the template, 3 for
> the
> >> csv-s), then you have defined 4 output generators there. If you have a
> data
> >> source that loads a list of 3 entities (say, 3 csv files, so it's a
> list of
> >> tables then), and you have 2 templates, and you tell the CLI to execute
> >> each template for each item in said data source, then you have just
> defined
> >> 6 "output generators".
> >>
> >> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
> >> [email protected]> wrote:
> >>
> >>> Hi Daniel,
> >>>
> >>> That all depends on your mental model and work you do, expectations,
> >>> experience :-)
> >>>
> >>>
> >>> __Document Handling__
> >>>
> >>> *"But I think actually we have no good use case for list of documents
> >>> that's passed at once to a single template run, so, we can just ignore
> >>> that complication"*
> >>>
> >>> In my case that's not a complication but my daily business - I'm
> >>> regularly wading through access logs - yesterday probably a couple of
> >>> hundreds access logs across two staging sites to help tracking some
> >>> strange API gateway issues :-)
> >>>
> >>> My gut feeling is (borrowing from
> >>>
> >>>
> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
> >>> )
> >>>
> >>> 1. You have a few lovely named documents / templates - `pets`
> >>> 2. You have tons of anonymous documents / templates to process -
> >>> `cattle`
> >>> 3. The "grey area" comes into play when mixing `pets & cattle`
> >>>
> >>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
> >>> it is equally important and common.
> >>>
> >>>
> >>> __Template And Document Processing Modes__
> >>>
> >>> IMHO it is important to answer the following question : "How many
> >>> outputs do you get when rendering 2 template and 3 datasources? Two,
> >>> Three or Six?"
> >>>
> >>> Your answer is influenced by your mental model / experience
> >>>
> >>> * When wading through tons of CSV files, access logs, etc. the answer
> is
> >>> "2"
> >>> * When doing source code generation the obvious answer is "6"
> >>> * Can't image a use case which results in "3" but I'm pretty sure we
> >>> will encounter one
> >>>
> >>> __Template and document mode probably shouldn't exist__
> >>>
> >>> That's hard for me to fully understand - I definitely lack your
> insights
> >>> & experience writing such tools :-)
> >>>
> >>> Defining the `Output Generator` is the underlying model for the Maven
> >>> plugin (and probably FMPP).
> >>>
> >>> I'm not sure if this applies for command lines at least not in the way
> I
> >>> use them (or would like to use them)
> >>>
> >>>
> >>> Thanks in advance,
> >>>
> >>> Siegfried Goeschl
> >>>
> >>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
> >>>
> >>>
> >>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
> >>>
> >>>> Yeah, "data source" is surely a too popular name, but for reason.
> >>>> Anyone
> >>>> has other ideas?
> >>>>
> >>>> As of naming data sources and such. One thing I was wondering about
> >>>> back
> >>>> then is how to deal with list of documents given to a template, versus
> >>>> exactly 1 document given to a template. But I think actually we have
> >>>> no
> >>>> good use case for list of documents that's passed at once to a single
> >>>> template run, so, we can just ignore that complication. A document has
> >>>> a
> >>>> name, and that's always just a single document, not a collection, as
> >>>> far as
> >>>> the template is concerned. (We can have multiple documents per run,
> >>>> but
> >>>> those normally yield separate output generators, so it's still only
> >>>> one
> >>>> document per template.) However, we can have data source types
> >>>> (document
> >>>> types with old terminology) that collect together multiple data files.
> >>>> So
> >>>> then that complexity is encapsulated into the data source type, and
> >>>> doesn't
> >>>> complicate the overall architecture. That's another case when a data
> >>>> source
> >>>> is not just a file. Like maybe there's a data source type that loads
> >>>> all
> >>>> the CSV-s from a directory, into a single big table (I had such case),
> >>>> or
> >>>> even into a list of tables. Or, as I mentioned already, a data source
> >>>> is
> >>>> maybe an SQL query on a JDBC data source (and we got the first term
> >>>> clash... JDBC also call them data sources).
> >>>>
> >>>> Template and document mode probably shouldn't exist from user
> >>>> perspective
> >>>> either, at least not as a global option that must apply to everything
> >>>> in a
> >>>> run. They could just give the files that define the "output
> >>>> generators",
> >>>> and some of them will be templates, some of them are data files, in
> >>>> which
> >>>> case a template need to be associated with them (and there can be a
> >>>> couple
> >>>> of ways of doing that). And then again, there are the cases where you
> >>>> want
> >>>> to create one output generator per entity from some data source.
> >>>>
> >>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> >>>> [email protected]> wrote:
> >>>>
> >>>>> Hi Daniel,
> >>>>>
> >>>>> See my comments below - and thanks for your patience and input :-)
> >>>>>
> >>>>> *Renaming Document To DataSource*
> >>>>>
> >>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
> >>>>> and
> >>>>> its DataSource.
> >>>>>
> >>>>> *Template And Document Mode*
> >>>>>
> >>>>> Agreed - I think it is a valuable abstraction for the user but it is
> >>>>> not
> >>>>> an implementation concept :-)
> >>>>>
> >>>>> *Document Without Symbolic Names*
> >>>>>
> >>>>> Also agreed and it is going to change but I have not settled my mind
> >>>>> yet
> >>>>> what exactly to implement.
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
> >>>>>
> >>>>> A few quick thoughts on that:
> >>>>>
> >>>>> - We should replace the "document" term with something more speaking.
> >>>>> It
> >>>>> doesn't tell that it's some kind of input. Also, most of these inputs
> >>>>> aren't something that people typically call documents. Like a csv
> >>>>> file, or
> >>>>> a database table, which is not even a file (OK we don't support such
> >>>>> thing
> >>>>> at the moment). I think, maybe "data source" is a safe enough term.
> >>>>> (It
> >>>>> also rhymes with data model.)
> >>>>> - You have separate "template" and "document" "mode", that applies to
> >>>>> a
> >>>>> whole run. I think such specialization won't be helpful. We could
> >>>>> just say,
> >>>>> on the conceptual level at lest, that we need a set of "outputs
> >>>>> generators". An output generator is an object (in the API) that
> >>>>> specifies a
> >>>>> template, a data-model (where the data-model is possibly populated
> >>>>> with
> >>>>> "documents"), and an output "sink" (a file path, or stdout), and can
> >>>>> generate the output itself. A practical way of defining the output
> >>>>> generators in a CLI application is via a bunch of files, each
> >>>>> defining an
> >>>>> output generator. Some of those files is maybe a template (that you
> >>>>> can
> >>>>> even detect from the file extension), or a data file that we
> >>>>> currently call
> >>>>> a "document". They could freely mix inside the same run. I have also
> >>>>> met
> >>>>> use case when you have a single table (single "document"), and each
> >>>>> record
> >>>>> in it yields an output file. That can also be described in some file
> >>>>> format, or really in any other way, like directly in command line
> >>>>> argument,
> >>>>> via API, etc.
> >>>>> - You have multiple documents without associated symbolical name in
> >>>>> some
> >>>>> examples. Templates can't identify those then in a well maintainable
> >>>>> way.
> >>>>> The actual file name is often not a good identifier, can change over
> >>>>> time,
> >>>>> and you might don't even have good control over it, like you already
> >>>>> receive it as a parameter from somewhere else, or someone
> >>>>> moves/renames
> >>>>> that files that you need to read. Index is also not very good, but I
> >>>>> have
> >>>>> written about that earlier.
> >>>>>
> >>>>>
> >>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> >>>>> [email protected]> wrote:
> >>>>>
> >>>>> Hi folks,
> >>>>>
> >>>>> still wrapping my side around but assembled some thoughts here -
> >>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>>
> >>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]> wrote:
> >>>>>
> >>>>> What you are describing is more like the angle that FMPP took
> >>>>> initially,
> >>>>> where templates drive things, they generate the output for themselves
> >>>>>
> >>>>> (even
> >>>>>
> >>>>> multiple output files if they wish). By default output files name
> >>>>> (and
> >>>>> relative path) is deduced from template name. There was also a global
> >>>>> data-model, built in a configuration file (or equally, built via
> >>>>> command
> >>>>> line arguments, or both mixed), from which templates get whatever
> >>>>> data
> >>>>>
> >>>>> they
> >>>>>
> >>>>> are interested in. Take a look at the figures here:
> >>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
> >>>>>
> >>>>> generalized
> >>>>>
> >>>>> a bit more, because you could add XML files at the same place where
> >>>>> you
> >>>>> have the templates, and then you could associate transform templates
> >>>>> to
> >>>>>
> >>>>> the
> >>>>>
> >>>>> XML files (based on path pattern and/or the XML document element).
> >>>>> Now
> >>>>> that's like what freemarker-generator had initially (data files drive
> >>>>> output, and the template is there to transform it).
> >>>>>
> >>>>> So I think the generic mental model would like this:
> >>>>>
> >>>>> 1. You got files that drive the process, let's call them *generator
> >>>>> files* for now. Usually, each generator file yields an output file
> >>>>> (but
> >>>>> maybe even multiple output files, as you might saw in the last
> >>>>> figure).
> >>>>> These generator files can be of many types, like XML, JSON, XLSX (as
> >>>>>
> >>>>> in the
> >>>>>
> >>>>> original freemarker-generator), and even templates (as is the norm in
> >>>>> FMPP). If the file is not a template, then you got a set of
> >>>>> transformer
> >>>>> templates (-t CLI option) in a separate directory, which can be
> >>>>>
> >>>>> associated
> >>>>>
> >>>>> with the generator files base on name patterns, and even based on
> >>>>>
> >>>>> content
> >>>>>
> >>>>> (schema usually). If the generator file is a template (so that's a
> >>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
> >>>>> is
> >>>>>
> >>>>> not
> >>>>>
> >>>>> a template file specified after the "-t" option), then you just
> >>>>> Template.process(...) it, and it prints what the output will be.
> >>>>> 2. You also have a set of variables, the global data-model, that
> >>>>> contains commonly useful stuff, like what you now call parameters
> >>>>> (CLI
> >>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
> >>>>>
> >>>>> data
> >>>>>
> >>>>> files aren't "generator files". Templates just use them if they need
> >>>>>
> >>>>> them.
> >>>>>
> >>>>> An important thing here is to reuse the same mechanism to read and
> >>>>>
> >>>>> parse
> >>>>>
> >>>>> those data files, which was used in templates when transforming
> >>>>>
> >>>>> generator
> >>>>>
> >>>>> files. So we need a common format for specifying how to load data
> >>>>>
> >>>>> files.
> >>>>>
> >>>>> That's maybe just FTL that #assigns to the variables, or maybe more
> >>>>> declarative format.
> >>>>>
> >>>>> What I have described in the original post here was a less generic
> >>>>> form
> >>>>>
> >>>>> of
> >>>>>
> >>>>> this, as I tried to be true with the original approach. I though the
> >>>>> proposal will be drastic enough as it is... :) There, the "main"
> >>>>> document
> >>>>> is the "generator file" from point 1, the "-t" template is the
> >>>>> transform
> >>>>> template for the "main" document, and the other named documents
> >>>>> ("users",
> >>>>> "groups") is a poor man's shared data-model from point 2 (together
> >>>>> with
> >>>>> with -PName=value).
> >>>>>
> >>>>> There's further somewhat confusing thing to get right with the
> >>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
> >>>>> In
> >>>>> the model above, as per point 1, if you list multiple data files,
> >>>>> each
> >>>>>
> >>>>> will
> >>>>>
> >>>>> generate a separate output file. So, if you need take in a list of
> >>>>> files
> >>>>>
> >>>>> to
> >>>>>
> >>>>> transform it to a single output file (or at least with a single
> >>>>> transform
> >>>>> template execution), then you have to be explicit about that, as
> >>>>> that's
> >>>>>
> >>>>> not
> >>>>>
> >>>>> the default behavior anymore. But it's still absolutely possible.
> >>>>> Imagine
> >>>>> it as a "list of XLSX-es" is itself like a file format. You need some
> >>>>> CLI
> >>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
> >>>>> be a
> >>>>> big deal.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> >>>>> [email protected]> wrote:
> >>>>>
> >>>>> Hi Daniel,
> >>>>>
> >>>>> Good timing - I was looking at a similar problem from different angle
> >>>>> yesterday (see below)
> >>>>>
> >>>>> Don't have enough time to answer your email in detail now - will do
> >>>>> that
> >>>>> tomorrow evening
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>>
> >>>>> ===. START
> >>>>> # FreeMarker CLI Improvement
> >>>>> ## Support Of Multiple Template Files
> >>>>> Currently we support the following combinations
> >>>>>
> >>>>> * Single template and no data files
> >>>>> * Single template and one or more data files
> >>>>>
> >>>>> But we can not support the following use case which is quite typical
> >>>>> in
> >>>>> the cloud
> >>>>>
> >>>>> __Convert multiple templates with a single data file, e.g copying a
> >>>>> directory of configuration files using a JSON configuration file__
> >>>>>
> >>>>> ## Implementation notes
> >>>>> * When we copy a directory we can remove the `ftl`extension on the
> >>>>> fly
> >>>>> * We might need an `exclude` filter for the copy operation
> >>>>> * Initially resolve to a list of template files and process one after
> >>>>> another
> >>>>> * Need to calculate the output file location and extension
> >>>>> * We need to rename the existing command line parameters (see below)
> >>>>> * Do we need multiple include and exclude filter?
> >>>>> * Do we need file versus directory filters?
> >>>>>
> >>>>> ### Command Line Options
> >>>>> ```
> >>>>> --input-encoding : Encoding of the documents
> >>>>> --output-encoding : Encoding of the rendered template
> >>>>> --template-encoding : Encoding of the template
> >>>>> --output : Output file or directory
> >>>>> --include-document : Include pattern for documents
> >>>>> --exclude-document : Exclude pattern for documents
> >>>>> --include-template: Include pattern for templates
> >>>>> --exclude-template : Exclude pattern for templates
> >>>>> ```
> >>>>>
> >>>>> ### Command Line Examples
> >>>>> ```text
> >>>>> # Copy all FTL templates found in "ext/config" to the "/config"
> >>>>>
> >>>>> directory
> >>>>>
> >>>>> using the data from "config.json"
> >>>>>
> >>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
> >>>>>
> >>>>> config.json
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config config.json
> >>>>>
> >>>>> # Bascically the same using a named document "configuration"
> >>>>> # It might make sense to expose "conf" directly in the FreeMarker
> >>>>> data
> >>>>> model
> >>>>> # It might make sens to allow URIs for loading documents
> >>>>>
> >>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >>>>>
> >>>>> configuration=config.json
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config --document configuration=config.json
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config --document configuration=file:///config.json
> >>>>>
> >>>>> # Bascically the same using an environment variable as named document
> >>>>>
> >>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
> >>>>>
> >>>>> configuration=env:///CONFIGURATION
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config --document configuration=env:///CONFIGURATION
> >>>>> ```
> >>>>> === END
> >>>>>
> >>>>> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote:
> >>>>>
> >>>>> Input documents is a fundamental concept in freemarker-generator, so
> >>>>> we
> >>>>> should think about that more, and probably refine/rework how it's
> >>>>> done.
> >>>>>
> >>>>> Currently it works like this, with CLI at least.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> somewhere/foo-access-log.csv
> >>>>>
> >>>>> Then in access-report.ftl you have to do something like this:
> >>>>>
> >>>>> <#assign doc = Documents.get(0)>
> >>>>> ... process doc here
> >>>>>
> >>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
> >>>>>
> >>>>> funny
> >>>>>
> >>>>> chain of coincidences: It returned the string "D", then
> >>>>>
> >>>>> CSVTool.parse(...)
> >>>>>
> >>>>> happily parsed that to a table with the single column "D", and 0
> >>>>> rows,
> >>>>>
> >>>>> and
> >>>>>
> >>>>> as there were 0 rows, the template didn't run into an error because
> >>>>> row.myExpectedColumn refers to a missing column either, so the
> >>>>> process
> >>>>> finished with success. (: Pretty unlucky for sure. The root was
> >>>>> unintentionally breaking a FreeMarker idiom though; eventually we
> >>>>> will
> >>>>>
> >>>>> have
> >>>>>
> >>>>> to work on those too, but, different topic.)
> >>>>>
> >>>>> However, actually multiple input documents can be passed in:
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> somewhere/foo-access-log.csv
> >>>>> somewhere/bar-access-log.csv
> >>>>>
> >>>>> Above template will still work, though then you ignored all but the
> >>>>>
> >>>>> first
> >>>>>
> >>>>> document. So if you expect any number of input documents, you
> >>>>> probably
> >>>>>
> >>>>> will
> >>>>>
> >>>>> have to do this:
> >>>>>
> >>>>> <#list Documents.list as doc>
> >>>>> ... process doc here
> >>>>> </#list>
> >>>>>
> >>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
> >>>>>
> >>>>> those
> >>>>>
> >>>>> we will work out in a different thread.)
> >>>>>
> >>>>>
> >>>>> So, what would be better, in my opinion. I start out from what I
> >>>>> think
> >>>>>
> >>>>> are
> >>>>>
> >>>>> the common uses cases, in decreasing order of frequency. Goal is to
> >>>>>
> >>>>> make
> >>>>>
> >>>>> those less error prone for the users, and simpler to express.
> >>>>>
> >>>>> USE CASE 1
> >>>>>
> >>>>> You have exactly 1 input documents, which is therefore simply "the"
> >>>>> document in the mind of the user. This is probably the typical use
> >>>>>
> >>>>> case,
> >>>>>
> >>>>> but at least the use case users typically start out from when
> >>>>> starting
> >>>>>
> >>>>> the
> >>>>>
> >>>>> work.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> somewhere/foo-access-log.csv
> >>>>>
> >>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
> >>>>>
> >>>>> error
> >>>>>
> >>>>> prone, because if the user passed in more than 1 documents (can even
> >>>>>
> >>>>> happen
> >>>>>
> >>>>> totally accidentally, like if the user was lazy and used a wildcard
> >>>>>
> >>>>> that
> >>>>>
> >>>>> the shell exploded), the template will silently ignore the rest of
> >>>>> the
> >>>>> documents, and the singe document processed will be practically
> >>>>> picked
> >>>>> randomly. The user might won't notice that and submits a bad report
> >>>>> or
> >>>>>
> >>>>> such.
> >>>>>
> >>>>> I think that in this use case the document should be simply referred
> >>>>> as
> >>>>> `Document` in the template. When you have multiple documents there,
> >>>>> referring to `Document` should be an error, saying that the template
> >>>>>
> >>>>> was
> >>>>>
> >>>>> made to process a single document only.
> >>>>>
> >>>>>
> >>>>> USE CASE 2
> >>>>>
> >>>>> You have multiple input documents, but each has different role
> >>>>>
> >>>>> (different
> >>>>>
> >>>>> schema, maybe different file type). Like, you pass in users.csv and
> >>>>> groups.csv. Each has difference schema, and so you want to access
> >>>>> them
> >>>>> differently, but in the same template.
> >>>>>
> >>>>> freemarker-cli
> >>>>> [...]
> >>>>> --named-document users somewhere/foo-users.csv
> >>>>> --named-document groups somewhere/foo-groups.csv
> >>>>>
> >>>>> Then in the template you could refer to them as:
> >>>>>
> >>>>> `NamedDocuments.users`,
> >>>>>
> >>>>> and `NamedDocuments.groups`.
> >>>>>
> >>>>> Use Case 1, and 2 can be unified into a coherent concept, where
> >>>>>
> >>>>> `Document`
> >>>>>
> >>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
> >>>>>
> >>>>> because
> >>>>>
> >>>>> that's "the" document the template is about, but then you have to
> >>>>> added
> >>>>> some helper documents, with symbolic names representing their role.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>> --document-name=users somewhere/foo-users.csv
> >>>>> --document-name=groups somewhere/foo-groups.csv
> >>>>>
> >>>>> Here, `Document` still works in the template, and it refers to
> >>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
> >>>>>
> >>>>> above
> >>>>>
> >>>>> would be cleaner, I couldn't figure out how to do that with Picocli.
> >>>>> Anyway, for now the point is the concept, which is not specific to
> >>>>>
> >>>>> CLI.)
> >>>>>
> >>>>> USE CASE 3
> >>>>>
> >>>>> Here you have several of the same kind of documents. That has a more
> >>>>> generic sub-use-case, when you have explicitly named documents (like
> >>>>> "users" above), and for some you expect multiple input files.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>> somewhere/bar-access-log.csv
> >>>>> --document-name=users somewhere/foo-users.csv
> >>>>> somewhere/bar-users.csv
> >>>>> --document-name=groups somewhere/global-groups.csv
> >>>>>
> >>>>> The template must to be written with this use case in mind, as now it
> >>>>>
> >>>>> has
> >>>>>
> >>>>> #list some of the documents. (I think in practice you hardly ever
> >>>>> want
> >>>>>
> >>>>> to
> >>>>>
> >>>>> get a document by hard coded index. Either you don't know how many
> >>>>> documents you have, so you can't use hard coded indexes, or you do,
> >>>>> and
> >>>>> each index has a specific meaning, but then you should name the
> >>>>>
> >>>>> documents
> >>>>>
> >>>>> instead, as using indexes is error prone, and hard to read.)
> >>>>> Accessing that list of documents in the template, maybe could be done
> >>>>>
> >>>>> like
> >>>>>
> >>>>> this:
> >>>>> - For the "main" documents: `DocumentList`
> >>>>> - For explicitly named documents, like "users":
> >>>>>
> >>>>> `NamedDocumentLists.users`
> >>>>>
> >>>>> SUMMING UP
> >>>>>
> >>>>> To unify all 3 use cases into a coherent concept:
> >>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you
> >>>>>
> >>>>> can
> >>>>>
> >>>>> achieve everything with it, using it requires your template to handle
> >>>>>
> >>>>> the
> >>>>>
> >>>>> most generic case too. So, I think it would be rarely used.
> >>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
> >>>>>
> >>>>> It's
> >>>>>
> >>>>> used if you only have one kind of documents (single format and
> >>>>> schema),
> >>>>>
> >>>>> but
> >>>>>
> >>>>> potentially multiple of them.
> >>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
> >>>>> document
> >>>>>
> >>>>> of
> >>>>>
> >>>>> the given name.
> >>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
> >>>>> for
> >>>>>
> >>>>> the
> >>>>>
> >>>>> most natural/frequent use case.
> >>>>>
> >>>>> That's 4 possible ways of accessing your documents, which is a
> >>>>>
> >>>>> trade-off
> >>>>>
> >>>>> for the sake of these:
> >>>>> - Catching CLI (or Maven, etc.) input where the template output
> >>>>> likely
> >>>>>
> >>>>> will
> >>>>>
> >>>>> be wrong. That's only possible if the user can communicate its intent
> >>>>>
> >>>>> in
> >>>>>
> >>>>> the template.
> >>>>> - Users don't need to deal with concepts that are irrelevant in their
> >>>>> concrete use case. Just start with the trivial, `Document`, and later
> >>>>>
> >>>>> if
> >>>>>
> >>>>> the need arises, generalize to named documents, document lists, or
> >>>>>
> >>>>> both.
> >>>>>
> >>>>> What do guys think?
> >>>>>
> >>>>>
> >>>
> >>
> >>
> >> --
> >> Best regards,
> >> Daniel Dekany
> >>
> >
> >
> > --
> > Best regards,
> > Daniel Dekany
>
>

-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Reply via email to