Re: freemarker-generator: Improving the input documents concept

Daniel Dekany Sat, 29 Feb 2020 09:37:45 -0800

But, "datasource" is just not an existing word, right? Of course if we put
spelling mistakes into class names, that will decrease the chance of name
clashes big time, but... :)


On Sat, Feb 29, 2020 at 6:06 PM Siegfried Goeschl <
[email protected]> wrote:

> Well, clashes with the "java.activation.DataSource" - can do & not
> definite opinion about it :)
>
> > On 29.02.2020, at 18:03, Daniel Dekany <[email protected]> wrote:
> >
> > I believe that should be DataSource (with capital S), as it's two words.
> >
> > Also, it's the name of a too widely used and known JDBC interface. So if
> > anyone can tell a similarly descriptive alternative...
> >
> > On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
> > [email protected]> wrote:
> >
> >> Hi Daniel,
> >>
> >> I'm an enterprise developer - bad habits die hard :-)
> >>
> >> So I closed the following tickets and merged the branches
> >>
> >> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
> >> "freemarker-generator"
> >> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
> "Datasource"
> >> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
> >> for datasources
> >>
> >> Thanks in advance,
> >>
> >> Siegfried Goeschl
> >>
> >>
> >>> On 29.02.2020, at 12:19, Daniel Dekany <[email protected]>
> wrote:
> >>>
> >>> Yeah, and of course, you can merge that branch. You can even work on
> the
> >>> master directly after all.
> >>>
> >>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
> [email protected]>
> >>> wrote:
> >>>
> >>>> But, I do recognize the cattle use case (several "faceless" files with
> >>>> common format/schema). Only, my idea is to push that complexity on the
> >> data
> >>>> source. The "data source" concept shields the rest of the application
> >> from
> >>>> the details of how the data is stored or retrieved. So, a data source
> >> might
> >>>> loads a bunch of log files from a directory, and present them as a
> >> single
> >>>> big table, or like a list of tables, etc. So I want to deal with the
> >> cattle
> >>>> use case, but the question is what part of the of architecture will
> deal
> >>>> with this complication, with other words, how do you box things. Why
> my
> >>>> initial bet is to stuff that complication into the "data source"
> >>>> implementation(s) is that data sources are inherently varied. Some
> >> returns
> >>>> a table-like thing, some have multiple named tables (worksheets in
> >> Excel),
> >>>> some returns tree of nodes (XML), etc. So then, some might returns a
> >>>> list-of-list-of log records, or just a single list of log-records (put
> >>>> together from daily log files). That way cattles don't add to
> conceptual
> >>>> complexity. Now, you might be aware of cases where the cattle concept
> >> must
> >>>> be more exposed than this, and the we can't box things like this. But
> >> this
> >>>> is what I tried to express.
> >>>>
> >>>> Regarding "output generators", and how that applies on the command
> >> line. I
> >>>> think it's important that the common core between Maven and
> >> command-line is
> >>>> as fat as possible. Ideally, they are just two syntax to set up the
> same
> >>>> thing. Mostly at least. So, if you specify a template file to the CLI
> >>>> application, in a way so that it causes it to process that template to
> >>>> generate a single output, then there you have just defined an "output
> >>>> generator" (even if it wasn't explicitly called like that in the
> command
> >>>> line). If you specify 3 csv files to the CLI application, in a way so
> >> that
> >>>> it causes it to generate 3 output files, then you have just defined 3
> >>>> "output generators" there (there's at least one template specified
> there
> >>>> too, but that wasn't an "output generator" itself, it was just an
> >> attribute
> >>>> of the 3 output generators). If you specify 1 template, and 3 csv
> >> files, in
> >>>> a way so that it will yield 4 output files (1 for the template, 3 for
> >> the
> >>>> csv-s), then you have defined 4 output generators there. If you have a
> >> data
> >>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
> >> list of
> >>>> tables then), and you have 2 templates, and you tell the CLI to
> execute
> >>>> each template for each item in said data source, then you have just
> >> defined
> >>>> 6 "output generators".
> >>>>
> >>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
> >>>> [email protected]> wrote:
> >>>>
> >>>>> Hi Daniel,
> >>>>>
> >>>>> That all depends on your mental model and work you do, expectations,
> >>>>> experience :-)
> >>>>>
> >>>>>
> >>>>> __Document Handling__
> >>>>>
> >>>>> *"But I think actually we have no good use case for list of documents
> >>>>> that's passed at once to a single template run, so, we can just
> ignore
> >>>>> that complication"*
> >>>>>
> >>>>> In my case that's not a complication but my daily business - I'm
> >>>>> regularly wading through access logs - yesterday probably a couple of
> >>>>> hundreds access logs across two staging sites to help tracking some
> >>>>> strange API gateway issues :-)
> >>>>>
> >>>>> My gut feeling is (borrowing from
> >>>>>
> >>>>>
> >>
> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
> >>>>> )
> >>>>>
> >>>>> 1. You have a few lovely named documents / templates - `pets`
> >>>>> 2. You have tons of anonymous documents / templates to process -
> >>>>> `cattle`
> >>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
> >>>>>
> >>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1)
> since
> >>>>> it is equally important and common.
> >>>>>
> >>>>>
> >>>>> __Template And Document Processing Modes__
> >>>>>
> >>>>> IMHO it is important to answer the following question : "How many
> >>>>> outputs do you get when rendering 2 template and 3 datasources? Two,
> >>>>> Three or Six?"
> >>>>>
> >>>>> Your answer is influenced by your mental model / experience
> >>>>>
> >>>>> * When wading through tons of CSV files, access logs, etc. the answer
> >> is
> >>>>> "2"
> >>>>> * When doing source code generation the obvious answer is "6"
> >>>>> * Can't image a use case which results in "3" but I'm pretty sure we
> >>>>> will encounter one
> >>>>>
> >>>>> __Template and document mode probably shouldn't exist__
> >>>>>
> >>>>> That's hard for me to fully understand - I definitely lack your
> >> insights
> >>>>> & experience writing such tools :-)
> >>>>>
> >>>>> Defining the `Output Generator` is the underlying model for the Maven
> >>>>> plugin (and probably FMPP).
> >>>>>
> >>>>> I'm not sure if this applies for command lines at least not in the
> way
> >> I
> >>>>> use them (or would like to use them)
> >>>>>
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
> >>>>>
> >>>>>
> >>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
> >>>>>
> >>>>>> Yeah, "data source" is surely a too popular name, but for reason.
> >>>>>> Anyone
> >>>>>> has other ideas?
> >>>>>>
> >>>>>> As of naming data sources and such. One thing I was wondering about
> >>>>>> back
> >>>>>> then is how to deal with list of documents given to a template,
> versus
> >>>>>> exactly 1 document given to a template. But I think actually we have
> >>>>>> no
> >>>>>> good use case for list of documents that's passed at once to a
> single
> >>>>>> template run, so, we can just ignore that complication. A document
> has
> >>>>>> a
> >>>>>> name, and that's always just a single document, not a collection, as
> >>>>>> far as
> >>>>>> the template is concerned. (We can have multiple documents per run,
> >>>>>> but
> >>>>>> those normally yield separate output generators, so it's still only
> >>>>>> one
> >>>>>> document per template.) However, we can have data source types
> >>>>>> (document
> >>>>>> types with old terminology) that collect together multiple data
> files.
> >>>>>> So
> >>>>>> then that complexity is encapsulated into the data source type, and
> >>>>>> doesn't
> >>>>>> complicate the overall architecture. That's another case when a data
> >>>>>> source
> >>>>>> is not just a file. Like maybe there's a data source type that loads
> >>>>>> all
> >>>>>> the CSV-s from a directory, into a single big table (I had such
> case),
> >>>>>> or
> >>>>>> even into a list of tables. Or, as I mentioned already, a data
> source
> >>>>>> is
> >>>>>> maybe an SQL query on a JDBC data source (and we got the first term
> >>>>>> clash... JDBC also call them data sources).
> >>>>>>
> >>>>>> Template and document mode probably shouldn't exist from user
> >>>>>> perspective
> >>>>>> either, at least not as a global option that must apply to
> everything
> >>>>>> in a
> >>>>>> run. They could just give the files that define the "output
> >>>>>> generators",
> >>>>>> and some of them will be templates, some of them are data files, in
> >>>>>> which
> >>>>>> case a template need to be associated with them (and there can be a
> >>>>>> couple
> >>>>>> of ways of doing that). And then again, there are the cases where
> you
> >>>>>> want
> >>>>>> to create one output generator per entity from some data source.
> >>>>>>
> >>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> >>>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>> Hi Daniel,
> >>>>>>>
> >>>>>>> See my comments below - and thanks for your patience and input :-)
> >>>>>>>
> >>>>>>> *Renaming Document To DataSource*
> >>>>>>>
> >>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
> >>>>>>> and
> >>>>>>> its DataSource.
> >>>>>>>
> >>>>>>> *Template And Document Mode*
> >>>>>>>
> >>>>>>> Agreed - I think it is a valuable abstraction for the user but it
> is
> >>>>>>> not
> >>>>>>> an implementation concept :-)
> >>>>>>>
> >>>>>>> *Document Without Symbolic Names*
> >>>>>>>
> >>>>>>> Also agreed and it is going to change but I have not settled my
> mind
> >>>>>>> yet
> >>>>>>> what exactly to implement.
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Siegfried Goeschl
> >>>>>>>
> >>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
> >>>>>>>
> >>>>>>> A few quick thoughts on that:
> >>>>>>>
> >>>>>>> - We should replace the "document" term with something more
> speaking.
> >>>>>>> It
> >>>>>>> doesn't tell that it's some kind of input. Also, most of these
> inputs
> >>>>>>> aren't something that people typically call documents. Like a csv
> >>>>>>> file, or
> >>>>>>> a database table, which is not even a file (OK we don't support
> such
> >>>>>>> thing
> >>>>>>> at the moment). I think, maybe "data source" is a safe enough term.
> >>>>>>> (It
> >>>>>>> also rhymes with data model.)
> >>>>>>> - You have separate "template" and "document" "mode", that applies
> to
> >>>>>>> a
> >>>>>>> whole run. I think such specialization won't be helpful. We could
> >>>>>>> just say,
> >>>>>>> on the conceptual level at lest, that we need a set of "outputs
> >>>>>>> generators". An output generator is an object (in the API) that
> >>>>>>> specifies a
> >>>>>>> template, a data-model (where the data-model is possibly populated
> >>>>>>> with
> >>>>>>> "documents"), and an output "sink" (a file path, or stdout), and
> can
> >>>>>>> generate the output itself. A practical way of defining the output
> >>>>>>> generators in a CLI application is via a bunch of files, each
> >>>>>>> defining an
> >>>>>>> output generator. Some of those files is maybe a template (that you
> >>>>>>> can
> >>>>>>> even detect from the file extension), or a data file that we
> >>>>>>> currently call
> >>>>>>> a "document". They could freely mix inside the same run. I have
> also
> >>>>>>> met
> >>>>>>> use case when you have a single table (single "document"), and each
> >>>>>>> record
> >>>>>>> in it yields an output file. That can also be described in some
> file
> >>>>>>> format, or really in any other way, like directly in command line
> >>>>>>> argument,
> >>>>>>> via API, etc.
> >>>>>>> - You have multiple documents without associated symbolical name in
> >>>>>>> some
> >>>>>>> examples. Templates can't identify those then in a well
> maintainable
> >>>>>>> way.
> >>>>>>> The actual file name is often not a good identifier, can change
> over
> >>>>>>> time,
> >>>>>>> and you might don't even have good control over it, like you
> already
> >>>>>>> receive it as a parameter from somewhere else, or someone
> >>>>>>> moves/renames
> >>>>>>> that files that you need to read. Index is also not very good, but
> I
> >>>>>>> have
> >>>>>>> written about that earlier.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> >>>>>>> [email protected]> wrote:
> >>>>>>>
> >>>>>>> Hi folks,
> >>>>>>>
> >>>>>>> still wrapping my side around but assembled some thoughts here -
> >>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Siegfried Goeschl
> >>>>>>>
> >>>>>>>
> >>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]>
> wrote:
> >>>>>>>
> >>>>>>> What you are describing is more like the angle that FMPP took
> >>>>>>> initially,
> >>>>>>> where templates drive things, they generate the output for
> themselves
> >>>>>>>
> >>>>>>> (even
> >>>>>>>
> >>>>>>> multiple output files if they wish). By default output files name
> >>>>>>> (and
> >>>>>>> relative path) is deduced from template name. There was also a
> global
> >>>>>>> data-model, built in a configuration file (or equally, built via
> >>>>>>> command
> >>>>>>> line arguments, or both mixed), from which templates get whatever
> >>>>>>> data
> >>>>>>>
> >>>>>>> they
> >>>>>>>
> >>>>>>> are interested in. Take a look at the figures here:
> >>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
> >>>>>>>
> >>>>>>> generalized
> >>>>>>>
> >>>>>>> a bit more, because you could add XML files at the same place where
> >>>>>>> you
> >>>>>>> have the templates, and then you could associate transform
> templates
> >>>>>>> to
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> XML files (based on path pattern and/or the XML document element).
> >>>>>>> Now
> >>>>>>> that's like what freemarker-generator had initially (data files
> drive
> >>>>>>> output, and the template is there to transform it).
> >>>>>>>
> >>>>>>> So I think the generic mental model would like this:
> >>>>>>>
> >>>>>>> 1. You got files that drive the process, let's call them *generator
> >>>>>>> files* for now. Usually, each generator file yields an output file
> >>>>>>> (but
> >>>>>>> maybe even multiple output files, as you might saw in the last
> >>>>>>> figure).
> >>>>>>> These generator files can be of many types, like XML, JSON, XLSX
> (as
> >>>>>>>
> >>>>>>> in the
> >>>>>>>
> >>>>>>> original freemarker-generator), and even templates (as is the norm
> in
> >>>>>>> FMPP). If the file is not a template, then you got a set of
> >>>>>>> transformer
> >>>>>>> templates (-t CLI option) in a separate directory, which can be
> >>>>>>>
> >>>>>>> associated
> >>>>>>>
> >>>>>>> with the generator files base on name patterns, and even based on
> >>>>>>>
> >>>>>>> content
> >>>>>>>
> >>>>>>> (schema usually). If the generator file is a template (so that's a
> >>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
> >>>>>>> is
> >>>>>>>
> >>>>>>> not
> >>>>>>>
> >>>>>>> a template file specified after the "-t" option), then you just
> >>>>>>> Template.process(...) it, and it prints what the output will be.
> >>>>>>> 2. You also have a set of variables, the global data-model, that
> >>>>>>> contains commonly useful stuff, like what you now call parameters
> >>>>>>> (CLI
> >>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc..
> Those
> >>>>>>>
> >>>>>>> data
> >>>>>>>
> >>>>>>> files aren't "generator files". Templates just use them if they
> need
> >>>>>>>
> >>>>>>> them.
> >>>>>>>
> >>>>>>> An important thing here is to reuse the same mechanism to read and
> >>>>>>>
> >>>>>>> parse
> >>>>>>>
> >>>>>>> those data files, which was used in templates when transforming
> >>>>>>>
> >>>>>>> generator
> >>>>>>>
> >>>>>>> files. So we need a common format for specifying how to load data
> >>>>>>>
> >>>>>>> files.
> >>>>>>>
> >>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
> >>>>>>> declarative format.
> >>>>>>>
> >>>>>>> What I have described in the original post here was a less generic
> >>>>>>> form
> >>>>>>>
> >>>>>>> of
> >>>>>>>
> >>>>>>> this, as I tried to be true with the original approach. I though
> the
> >>>>>>> proposal will be drastic enough as it is... :) There, the "main"
> >>>>>>> document
> >>>>>>> is the "generator file" from point 1, the "-t" template is the
> >>>>>>> transform
> >>>>>>> template for the "main" document, and the other named documents
> >>>>>>> ("users",
> >>>>>>> "groups") is a poor man's shared data-model from point 2 (together
> >>>>>>> with
> >>>>>>> with -PName=value).
> >>>>>>>
> >>>>>>> There's further somewhat confusing thing to get right with the
> >>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing
> though.
> >>>>>>> In
> >>>>>>> the model above, as per point 1, if you list multiple data files,
> >>>>>>> each
> >>>>>>>
> >>>>>>> will
> >>>>>>>
> >>>>>>> generate a separate output file. So, if you need take in a list of
> >>>>>>> files
> >>>>>>>
> >>>>>>> to
> >>>>>>>
> >>>>>>> transform it to a single output file (or at least with a single
> >>>>>>> transform
> >>>>>>> template execution), then you have to be explicit about that, as
> >>>>>>> that's
> >>>>>>>
> >>>>>>> not
> >>>>>>>
> >>>>>>> the default behavior anymore. But it's still absolutely possible.
> >>>>>>> Imagine
> >>>>>>> it as a "list of XLSX-es" is itself like a file format. You need
> some
> >>>>>>> CLI
> >>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
> >>>>>>> be a
> >>>>>>> big deal.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> >>>>>>> [email protected]> wrote:
> >>>>>>>
> >>>>>>> Hi Daniel,
> >>>>>>>
> >>>>>>> Good timing - I was looking at a similar problem from different
> angle
> >>>>>>> yesterday (see below)
> >>>>>>>
> >>>>>>> Don't have enough time to answer your email in detail now - will do
> >>>>>>> that
> >>>>>>> tomorrow evening
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Siegfried Goeschl
> >>>>>>>
> >>>>>>>
> >>>>>>> ===. START
> >>>>>>> # FreeMarker CLI Improvement
> >>>>>>> ## Support Of Multiple Template Files
> >>>>>>> Currently we support the following combinations
> >>>>>>>
> >>>>>>> * Single template and no data files
> >>>>>>> * Single template and one or more data files
> >>>>>>>
> >>>>>>> But we can not support the following use case which is quite
> typical
> >>>>>>> in
> >>>>>>> the cloud
> >>>>>>>
> >>>>>>> __Convert multiple templates with a single data file, e.g copying a
> >>>>>>> directory of configuration files using a JSON configuration file__
> >>>>>>>
> >>>>>>> ## Implementation notes
> >>>>>>> * When we copy a directory we can remove the `ftl`extension on the
> >>>>>>> fly
> >>>>>>> * We might need an `exclude` filter for the copy operation
> >>>>>>> * Initially resolve to a list of template files and process one
> after
> >>>>>>> another
> >>>>>>> * Need to calculate the output file location and extension
> >>>>>>> * We need to rename the existing command line parameters (see
> below)
> >>>>>>> * Do we need multiple include and exclude filter?
> >>>>>>> * Do we need file versus directory filters?
> >>>>>>>
> >>>>>>> ### Command Line Options
> >>>>>>> ```
> >>>>>>> --input-encoding : Encoding of the documents
> >>>>>>> --output-encoding : Encoding of the rendered template
> >>>>>>> --template-encoding : Encoding of the template
> >>>>>>> --output : Output file or directory
> >>>>>>> --include-document : Include pattern for documents
> >>>>>>> --exclude-document : Exclude pattern for documents
> >>>>>>> --include-template: Include pattern for templates
> >>>>>>> --exclude-template : Exclude pattern for templates
> >>>>>>> ```
> >>>>>>>
> >>>>>>> ### Command Line Examples
> >>>>>>> ```text
> >>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
> >>>>>>>
> >>>>>>> directory
> >>>>>>>
> >>>>>>> using the data from "config.json"
> >>>>>>>
> >>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
> >>>>>>>
> >>>>>>> config.json
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config config.json
> >>>>>>>
> >>>>>>> # Bascically the same using a named document "configuration"
> >>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
> >>>>>>> data
> >>>>>>> model
> >>>>>>> # It might make sens to allow URIs for loading documents
> >>>>>>>
> >>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >>>>>>>
> >>>>>>> configuration=config.json
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config --document configuration=config.json
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config --document configuration=file:///config.json
> >>>>>>>
> >>>>>>> # Bascically the same using an environment variable as named
> document
> >>>>>>>
> >>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config
> -d
> >>>>>>>
> >>>>>>> configuration=env:///CONFIGURATION
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config --document configuration=env:///CONFIGURATION
> >>>>>>> ```
> >>>>>>> === END
> >>>>>>>
> >>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Input documents is a fundamental concept in freemarker-generator,
> so
> >>>>>>> we
> >>>>>>> should think about that more, and probably refine/rework how it's
> >>>>>>> done.
> >>>>>>>
> >>>>>>> Currently it works like this, with CLI at least.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> somewhere/foo-access-log.csv
> >>>>>>>
> >>>>>>> Then in access-report.ftl you have to do something like this:
> >>>>>>>
> >>>>>>> <#assign doc = Documents.get(0)>
> >>>>>>> ... process doc here
> >>>>>>>
> >>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead
> to a
> >>>>>>>
> >>>>>>> funny
> >>>>>>>
> >>>>>>> chain of coincidences: It returned the string "D", then
> >>>>>>>
> >>>>>>> CSVTool.parse(...)
> >>>>>>>
> >>>>>>> happily parsed that to a table with the single column "D", and 0
> >>>>>>> rows,
> >>>>>>>
> >>>>>>> and
> >>>>>>>
> >>>>>>> as there were 0 rows, the template didn't run into an error because
> >>>>>>> row.myExpectedColumn refers to a missing column either, so the
> >>>>>>> process
> >>>>>>> finished with success. (: Pretty unlucky for sure. The root was
> >>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
> >>>>>>> will
> >>>>>>>
> >>>>>>> have
> >>>>>>>
> >>>>>>> to work on those too, but, different topic.)
> >>>>>>>
> >>>>>>> However, actually multiple input documents can be passed in:
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> somewhere/foo-access-log.csv
> >>>>>>> somewhere/bar-access-log.csv
> >>>>>>>
> >>>>>>> Above template will still work, though then you ignored all but the
> >>>>>>>
> >>>>>>> first
> >>>>>>>
> >>>>>>> document. So if you expect any number of input documents, you
> >>>>>>> probably
> >>>>>>>
> >>>>>>> will
> >>>>>>>
> >>>>>>> have to do this:
> >>>>>>>
> >>>>>>> <#list Documents.list as doc>
> >>>>>>> ... process doc here
> >>>>>>> </#list>
> >>>>>>>
> >>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
> >>>>>>>
> >>>>>>> those
> >>>>>>>
> >>>>>>> we will work out in a different thread.)
> >>>>>>>
> >>>>>>>
> >>>>>>> So, what would be better, in my opinion. I start out from what I
> >>>>>>> think
> >>>>>>>
> >>>>>>> are
> >>>>>>>
> >>>>>>> the common uses cases, in decreasing order of frequency. Goal is to
> >>>>>>>
> >>>>>>> make
> >>>>>>>
> >>>>>>> those less error prone for the users, and simpler to express.
> >>>>>>>
> >>>>>>> USE CASE 1
> >>>>>>>
> >>>>>>> You have exactly 1 input documents, which is therefore simply "the"
> >>>>>>> document in the mind of the user. This is probably the typical use
> >>>>>>>
> >>>>>>> case,
> >>>>>>>
> >>>>>>> but at least the use case users typically start out from when
> >>>>>>> starting
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> work.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> somewhere/foo-access-log.csv
> >>>>>>>
> >>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
> >>>>>>>
> >>>>>>> error
> >>>>>>>
> >>>>>>> prone, because if the user passed in more than 1 documents (can
> even
> >>>>>>>
> >>>>>>> happen
> >>>>>>>
> >>>>>>> totally accidentally, like if the user was lazy and used a wildcard
> >>>>>>>
> >>>>>>> that
> >>>>>>>
> >>>>>>> the shell exploded), the template will silently ignore the rest of
> >>>>>>> the
> >>>>>>> documents, and the singe document processed will be practically
> >>>>>>> picked
> >>>>>>> randomly. The user might won't notice that and submits a bad report
> >>>>>>> or
> >>>>>>>
> >>>>>>> such.
> >>>>>>>
> >>>>>>> I think that in this use case the document should be simply
> referred
> >>>>>>> as
> >>>>>>> `Document` in the template. When you have multiple documents there,
> >>>>>>> referring to `Document` should be an error, saying that the
> template
> >>>>>>>
> >>>>>>> was
> >>>>>>>
> >>>>>>> made to process a single document only.
> >>>>>>>
> >>>>>>>
> >>>>>>> USE CASE 2
> >>>>>>>
> >>>>>>> You have multiple input documents, but each has different role
> >>>>>>>
> >>>>>>> (different
> >>>>>>>
> >>>>>>> schema, maybe different file type). Like, you pass in users.csv and
> >>>>>>> groups.csv. Each has difference schema, and so you want to access
> >>>>>>> them
> >>>>>>> differently, but in the same template.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> [...]
> >>>>>>> --named-document users somewhere/foo-users.csv
> >>>>>>> --named-document groups somewhere/foo-groups.csv
> >>>>>>>
> >>>>>>> Then in the template you could refer to them as:
> >>>>>>>
> >>>>>>> `NamedDocuments.users`,
> >>>>>>>
> >>>>>>> and `NamedDocuments.groups`.
> >>>>>>>
> >>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
> >>>>>>>
> >>>>>>> `Document`
> >>>>>>>
> >>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
> >>>>>>>
> >>>>>>> because
> >>>>>>>
> >>>>>>> that's "the" document the template is about, but then you have to
> >>>>>>> added
> >>>>>>> some helper documents, with symbolic names representing their role.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>> --document-name=groups somewhere/foo-groups.csv
> >>>>>>>
> >>>>>>> Here, `Document` still works in the template, and it refers to
> >>>>>>> `somewhere/foo-access-log.csv`. (While omitting
> --document-name=main
> >>>>>>>
> >>>>>>> above
> >>>>>>>
> >>>>>>> would be cleaner, I couldn't figure out how to do that with
> Picocli.
> >>>>>>> Anyway, for now the point is the concept, which is not specific to
> >>>>>>>
> >>>>>>> CLI.)
> >>>>>>>
> >>>>>>> USE CASE 3
> >>>>>>>
> >>>>>>> Here you have several of the same kind of documents. That has a
> more
> >>>>>>> generic sub-use-case, when you have explicitly named documents
> (like
> >>>>>>> "users" above), and for some you expect multiple input files.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>> somewhere/bar-access-log.csv
> >>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>> somewhere/bar-users.csv
> >>>>>>> --document-name=groups somewhere/global-groups.csv
> >>>>>>>
> >>>>>>> The template must to be written with this use case in mind, as now
> it
> >>>>>>>
> >>>>>>> has
> >>>>>>>
> >>>>>>> #list some of the documents. (I think in practice you hardly ever
> >>>>>>> want
> >>>>>>>
> >>>>>>> to
> >>>>>>>
> >>>>>>> get a document by hard coded index. Either you don't know how many
> >>>>>>> documents you have, so you can't use hard coded indexes, or you do,
> >>>>>>> and
> >>>>>>> each index has a specific meaning, but then you should name the
> >>>>>>>
> >>>>>>> documents
> >>>>>>>
> >>>>>>> instead, as using indexes is error prone, and hard to read.)
> >>>>>>> Accessing that list of documents in the template, maybe could be
> done
> >>>>>>>
> >>>>>>> like
> >>>>>>>
> >>>>>>> this:
> >>>>>>> - For the "main" documents: `DocumentList`
> >>>>>>> - For explicitly named documents, like "users":
> >>>>>>>
> >>>>>>> `NamedDocumentLists.users`
> >>>>>>>
> >>>>>>> SUMMING UP
> >>>>>>>
> >>>>>>> To unify all 3 use cases into a coherent concept:
> >>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while
> you
> >>>>>>>
> >>>>>>> can
> >>>>>>>
> >>>>>>> achieve everything with it, using it requires your template to
> handle
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> most generic case too. So, I think it would be rarely used.
> >>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
> >>>>>>>
> >>>>>>> It's
> >>>>>>>
> >>>>>>> used if you only have one kind of documents (single format and
> >>>>>>> schema),
> >>>>>>>
> >>>>>>> but
> >>>>>>>
> >>>>>>> potentially multiple of them.
> >>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
> >>>>>>> document
> >>>>>>>
> >>>>>>> of
> >>>>>>>
> >>>>>>> the given name.
> >>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
> >>>>>>> for
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> most natural/frequent use case.
> >>>>>>>
> >>>>>>> That's 4 possible ways of accessing your documents, which is a
> >>>>>>>
> >>>>>>> trade-off
> >>>>>>>
> >>>>>>> for the sake of these:
> >>>>>>> - Catching CLI (or Maven, etc.) input where the template output
> >>>>>>> likely
> >>>>>>>
> >>>>>>> will
> >>>>>>>
> >>>>>>> be wrong. That's only possible if the user can communicate its
> intent
> >>>>>>>
> >>>>>>> in
> >>>>>>>
> >>>>>>> the template.
> >>>>>>> - Users don't need to deal with concepts that are irrelevant in
> their
> >>>>>>> concrete use case. Just start with the trivial, `Document`, and
> later
> >>>>>>>
> >>>>>>> if
> >>>>>>>
> >>>>>>> the need arises, generalize to named documents, document lists, or
> >>>>>>>
> >>>>>>> both.
> >>>>>>>
> >>>>>>> What do guys think?
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>> Daniel Dekany
> >>>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Daniel Dekany
> >>
> >>
> >
> > --
> > Best regards,
> > Daniel Dekany
>
>

-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Reply via email to