Hi Daniel,

The introduction of named `Datasource` allows to simplify / streamline a few things

* I have a meaningful user-supplied name
* I can pass additional configuration information as already implemented with `charset` and `contenttype` and this would also allow configure a `CSV Datasource`, e.g. `users=./data/users.csv#format=default&header=true&delimeter=TAB` which can be readily parses * Currently the name of datasources are are taken from their relative file name - might make sense to drop that but I need to contemplate :-)

Regarding the "global mode" and "output generators files" - I'm sorry, but I'm not getting it

* I refined the https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 to make my points more clearly * Do you think of defining explicit "output generator file" containing `datasources, `templates` and `outputs` - yes that could be done but does not feel like an interactive command line tool any longer


Regarding "more idiomatic FTL usage"

* Yes, I need to dive into custom template models or whatever it is called :-)


Something we need to iron out is a release policy

* Currently we have little agreement how the CLI should look like or behave * I think you are leaning towards a 1.0 release why I favour 0.x.y to have room to make mistakes / experiments * I personally see the possibility that we don't get a release out - "perfect is the enemy of good"

How would you like to handle the problem - can we agree on minimal feature set worthy a release?

Thanks in advance,

Siegfried Goeschl


On 1 Mar 2020, at 11:33, Daniel Dekany wrote:


Actually not recommended but we have named data sources for less than 24
hours


Sorry, not sure what that means. Anyway, my "vote" is let's not give
automatic names if that's not recommended to utilize. I mean, in case we happen to agree on that, why leave it there. Especially if automatically
chosen names can clash with explicitly given ones, that would be a
trouble. (I'm not sure right now if they can... the path we use as the
name can be realtive? Then it realistically can.)

This is a command line tool where we have little idea what the user will do
or abuse


No matter how much/little we know, we firmly put our bets by releasing
something. So if some feature is certainly not right, that's enough to not
have it, I think.

How does a "data loader" knows that it is responsible to load a file

What should as "CSV data loader" should do - parse it into a list of
records or stream one by one?


I think I was misunderstood here. It's not about some kind of auto-magic. It's about where do you specify what to load and how, and in what format do you specify that. Of course, you must specify the data source (basically an
URI for now as I saw), the rough format (CSV), and the format options
(separator character, etc.), and other freemarker-generator loading options (like which CSV columns are numbers, which are dates, with what format,
what counts as null, etc.).

What was confusing in what I said much earlier is probably that you don't need a global "--mode". That just means that you can have multiple "modes"
in the same run, not that you need some big auto-magic. And that they
aren't really "modes" then... I think it's just natural that you can have different kind of "output generator" files in the same run. Why force the assumption that you don't, especially considering that they will might want to access common data (which you don't want to load again and again, for
each run of the different --mode-s you need). Of course, as you might
select files with wildcards (or by specifying a whole directory, or with
some Maven matcher), you just can't directly associate the data loader
options to the individual data sources. Instead you can say elsewhere that *.csv inside this explicit "group", or with this file name pattern, is to be loaded like this. That's what you might perceived as auto-magic. It's
just mass-producing data loaders for "cattle" files.

How to handle the case if you have multiple potential data loaders for a
single file?


As per above, that's just two data loaders referring to the same data
source, so, nothing special.

As of the current state of things, this is how I'm supposed to load a CSV,
in the template itself (if I'm not outdated/mistaken):

<#assign cvsFormat = CSVTool.formats.DEFAULT.withHeader()>
<#assign foos = CSVTool.parse(Datasources.get("foos"), cvsFormat).records> <#assign bars = CSVTool.parse(Datasources.get("barb"), cvsFormat).records>

It will worth exploring how to make these look more "idiomatic" FTL (given
this is an "official" FM product now, I think, we should show how it's
done), and nicer in general. Point for now is, that's basically two
data-loaders interwoven with the template there. Because they are
interwoven like that, you can't reuse what they loaded for another template
execution.

That's comes down to personal preferences, e.g. chown uses "owner[:group] "


Yeah, but XML namespaces, Java, C, etc. all use <parent><operator><child>, so, I think, that clicks for more of our potential users. So let's bet on
what clicks for more users.

Besides, I challenged the very idea that we need both groups and names. :) Saying that it's simpler and less opinioned (more flexible) to have just
multiple names (like tags). What's the end of that?

On Sun, Mar 1, 2020 at 9:47 AM Siegfried Goeschl <
[email protected]> wrote:

HI Daniel,

Please see my comments below

Thanks in advance,

Siegfried Goeschl


On 29.02.2020, at 21:02, Daniel Dekany <[email protected]> wrote:


I try to provide a useful name even when the content is coming from an
URL


When is it recommended to rely on that though? Because utilizing that
means
that renaming a data source file can break the process, even if you call freemarker-cli with the up to date file name. And if that happens depends
on what you (or an other random colleague!) have dug inside the
templates.
So I guess we better just don't support this. Less code and less things
to
document too.


Actually not recommended but we have named data sources for less than 24
hours


I think we have a different understanding what a "Document" /
"Datasource
/ DataSource" should do


Thing is, eventually (most certainly pre-1.0, as it influences
architecture), certain needs will have to addressed, somehow. Then we
will
see what "things" we really need. For now I though we need "things" that
are much more than paths, and encapsulate the "how to load the data"
aspect. I called them data sources, but maybe we should called them "data
loaders" to free up data sources for the more primitive thing. Some
needs/doubts to address, *later*: Is it really the best approach for
users
to load/parse data sources programmatically (that coded is written in
FTL,
inside the templates)? Also, is the template the right place for doing that, because, when multiple templates (or just multiple template *runs*
of
the same template, each generating a different output file) needs common data, they shouldn't load it again and again. Also, different topic, can
we
handle the case "transparently" enough when the data is not coming from a
file?

This is a command line tool where we have little idea what the user will
do or abuse

* How does a "data loader" knows that it is responsible to load a file * What should as "CSV data loader" should do - parse it into a list of
records or stream one by one?
* How to handle the case if you have multiple potential data loaders for a
single file?

I'm leaning towards building blocks where the user controls the work to be
done even it requires one to two extra lines of FTL code



The joy of programming - I did not intend to use "name:group" together
with
wildcards :-)


For a CLI tool, I guess we agree that it should work. So maybe, like this
(here logs and foos meant to be "groups"):
--data-source logs file1.log file2.log fileN.log --data-source foos
foo1.csv foo2.csv fooN.csv  --data-source bar bar.xlsx

It so happens that here you don't really have a good control about the number of files associated to the name, so, maybe yet another reason to
not
differentiate names and groups.

I Disagree here - I think using a name would be used more often. I added
the "group" as an afterthought since some grouping could be useful


We do agree in that. What I said is that the *syntax* should be so that
the
group comes first. It's still optional. Like this:
--data-source group:name /somewhere
--data-source name /somewhere

That's comes down to personal preferences, e.g. chown uses "owner[:group] "


On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
[email protected]> wrote:

HI Daniel,

Seem my comments below

Thanks in advance,

Siegfried Goeschl


On 29.02.2020, at 19:08, Daniel Dekany <[email protected]>
wrote:

FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
for
datasources

So, I can do this to have both a name an a group associated to a data
source:
--datasource someName:someGroup=somewhere/something

Correct

Or if I only want a name, but not a group (or an "" group actually -
bug?), then:
--datasource someName=somewhere/something

Correct


Or if only a group but not a name (or a "" name actually) then:
--datasource :someGroup=somewhere/something

Mhmm, that would be unintended functionality from my side - current
approach is that every "Document" / "Datasource / DataSource" is named


A name must identify exactly 1 data source, while a group identifies a
list
of data sources.

No, every "Document" / "Datasource / DataSource" has a name currently
but
uniqueness is not enforced. Only if you want to get a "Document" /
"Datasource / DataSource" with it's exact name I checked for exactly one search hit and throw an exception. I try to provide a useful name even
when
the content is coming from an URL or STDIN (and I will probably add
environment variables as "Document" / "Datasource / DataSource", e.g
configuration in the cloud as JSON content passed as environment
variable)


Is that this idea, that the a data source can be part of a group, and
then
is also possibly identifiable with a name comes from an use case? I
mean,
it's possibly important somewhere, but if so, then it's strange that
you
can put something into only a single group. If we need this kind of
thing,
then perhaps you should be just allowed to associate the data source
with a
list of names (kind of like tagging), and then when the template wants
to
get something by name, it will tell there if it expects exactly one or
a
list of data sources. Then you don't need to introduce two terms in the documentation either (names and groups). Again, if we want this at all, instead of just going with a data source that itself gives a list. (And
if
not, how will we handle a data source that loads from a non-file
source?)

I actually thought of implementing tagging but considered a "group"
sufficient.

* If you don't define anything everything goes into the "default" group * For individual documents you can define a name and an optional group

I think we have a different understanding what a "Document" /
"Datasource
/ DataSource" should do

* It is a dumb
* It is lazy since data is only loaded on demand
* There is no automagic like "oh, this is a JSON file, so let's go to
the
JSON tool and create a map readily accessible in the data model"


Note that the current command line syntax doesn't work well with shell
wildcard expansion. Like this:
--datasource :someGroup=logs/*.log
will try to expand ":someGroup=logs/*.log", and because it finds
nothing
(and because the rules of sh and the like is a mess), you will get the
parameter value as is, without * expanded.

The joy of programming - I did not intend to use "name:group" together
with wildcards :-)


Also,  I think the syntax with colon should be flipped, because on
other
places foo:bar usually means that foo is the bigger unit (the
container),
and bar is the smaller unit (the child).

I Disagree here - I think using a name would be used more often. I added
the "group" as an afterthought since some grouping could be useful


On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
[email protected]> wrote:

Hi Daniel,

I'm an enterprise developer - bad habits die hard :-)

So I closed the following tickets and merged the branches

1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
"freemarker-generator"
2) FREEMARKER-134 freemarker-generator: Rename "Document" to
"Datasource"
3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied
names
for datasources

Thanks in advance,

Siegfried Goeschl


On 29.02.2020, at 12:19, Daniel Dekany <[email protected]>
wrote:

Yeah, and of course, you can merge that branch. You can even work on
the
master directly after all.

On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
[email protected]>
wrote:

But, I do recognize the cattle use case (several "faceless" files
with
common format/schema). Only, my idea is to push that complexity on
the
data
source. The "data source" concept shields the rest of the
application
from
the details of how the data is stored or retrieved. So, a data
source
might
loads a bunch of log files from a directory, and present them as a
single
big table, or like a list of tables, etc. So I want to deal with the
cattle
use case, but the question is what part of the of architecture will
deal
with this complication, with other words, how do you box things. Why
my
initial bet is to stuff that complication into the "data source" implementation(s) is that data sources are inherently varied. Some
returns
a table-like thing, some have multiple named tables (worksheets in
Excel),
some returns tree of nodes (XML), etc. So then, some might returns a list-of-list-of log records, or just a single list of log-records
(put
together from daily log files). That way cattles don't add to
conceptual
complexity. Now, you might be aware of cases where the cattle
concept
must
be more exposed than this, and the we can't box things like this.
But
this
is what I tried to express.

Regarding "output generators", and how that applies on the command
line. I
think it's important that the common core between Maven and
command-line is
as fat as possible. Ideally, they are just two syntax to set up the
same
thing. Mostly at least. So, if you specify a template file to the
CLI
application, in a way so that it causes it to process that template
to
generate a single output, then there you have just defined an
"output
generator" (even if it wasn't explicitly called like that in the
command
line). If you specify 3 csv files to the CLI application, in a way
so
that
it causes it to generate 3 output files, then you have just defined
3
"output generators" there (there's at least one template specified
there
too, but that wasn't an "output generator" itself, it was just an
attribute
of the 3 output generators). If you specify 1 template, and 3 csv
files, in
a way so that it will yield 4 output files (1 for the template, 3
for
the
csv-s), then you have defined 4 output generators there. If you
have a
data
source that loads a list of 3 entities (say, 3 csv files, so it's a
list of
tables then), and you have 2 templates, and you tell the CLI to
execute
each template for each item in said data source, then you have just
defined
6 "output generators".

On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
[email protected]> wrote:

Hi Daniel,

That all depends on your mental model and work you do,
expectations,
experience :-)


__Document Handling__

*"But I think actually we have no good use case for list of
documents
that's passed at once to a single template run, so, we can just
ignore
that complication"*

In my case that's not a complication but my daily business - I'm regularly wading through access logs - yesterday probably a couple
of
hundreds access logs across two staging sites to help tracking some
strange API gateway issues :-)

My gut feeling is (borrowing from




https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
)

1. You have a few lovely named documents / templates - `pets`
2. You have tons of anonymous documents / templates to process -
`cattle`
3. The "grey area" comes into play when mixing `pets & cattle`

`freemarker-cli` was built with 2) in mind and I want to cover 1)
since
it is equally important and common.


__Template And Document Processing Modes__

IMHO it is important to answer the following question : "How many outputs do you get when rendering 2 template and 3 datasources?
Two,
Three or Six?"

Your answer is influenced by your mental model / experience

* When wading through tons of CSV files, access logs, etc. the
answer
is
"2"
* When doing source code generation the obvious answer is "6"
* Can't image a use case which results in "3" but I'm pretty sure
we
will encounter one

__Template and document mode probably shouldn't exist__

That's hard for me to fully understand - I definitely lack your
insights
& experience writing such tools :-)

Defining the `Output Generator` is the underlying model for the
Maven
plugin (and probably FMPP).

I'm not sure if this applies for command lines at least not in the
way
I
use them (or would like to use them)


Thanks in advance,

Siegfried Goeschl

PS: Can/shall I merge the PR to bring in `freemarker-cli`?


On 28 Feb 2020, at 9:14, Daniel Dekany wrote:

Yeah, "data source" is surely a too popular name, but for reason.
Anyone
has other ideas?

As of naming data sources and such. One thing I was wondering
about
back
then is how to deal with list of documents given to a template,
versus
exactly 1 document given to a template. But I think actually we
have
no
good use case for list of documents that's passed at once to a
single
template run, so, we can just ignore that complication. A document
has
a
name, and that's always just a single document, not a collection,
as
far as
the template is concerned. (We can have multiple documents per
run,
but
those normally yield separate output generators, so it's still
only
one
document per template.) However, we can have data source types
(document
types with old terminology) that collect together multiple data
files.
So
then that complexity is encapsulated into the data source type,
and
doesn't
complicate the overall architecture. That's another case when a
data
source
is not just a file. Like maybe there's a data source type that
loads
all
the CSV-s from a directory, into a single big table (I had such
case),
or
even into a list of tables. Or, as I mentioned already, a data
source
is
maybe an SQL query on a JDBC data source (and we got the first
term
clash... JDBC also call them data sources).

Template and document mode probably shouldn't exist from user
perspective
either, at least not as a global option that must apply to
everything
in a
run. They could just give the files that define the "output
generators",
and some of them will be templates, some of them are data files,
in
which
case a template need to be associated with them (and there can be
a
couple
of ways of doing that). And then again, there are the cases where
you
want
to create one output generator per entity from some data source.

On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
[email protected]> wrote:

Hi Daniel,

See my comments below - and thanks for your patience and input
:-)

*Renaming Document To DataSource*

Yes, makes sense. I tried to avoid since I'm using
javax.activation
and
its DataSource.

*Template And Document Mode*

Agreed - I think it is a valuable abstraction for the user but it
is
not
an implementation concept :-)

*Document Without Symbolic Names*

Also agreed and it is going to change but I have not settled my
mind
yet
what exactly to implement.

Thanks in advance,

Siegfried Goeschl

On 28 Feb 2020, at 1:05, Daniel Dekany wrote:

A few quick thoughts on that:

- We should replace the "document" term with something more
speaking.
It
doesn't tell that it's some kind of input. Also, most of these
inputs
aren't something that people typically call documents. Like a csv
file, or
a database table, which is not even a file (OK we don't support
such
thing
at the moment). I think, maybe "data source" is a safe enough
term.
(It
also rhymes with data model.)
- You have separate "template" and "document" "mode", that
applies
to
a
whole run. I think such specialization won't be helpful. We could
just say,
on the conceptual level at lest, that we need a set of "outputs generators". An output generator is an object (in the API) that
specifies a
template, a data-model (where the data-model is possibly
populated
with
"documents"), and an output "sink" (a file path, or stdout), and
can
generate the output itself. A practical way of defining the
output
generators in a CLI application is via a bunch of files, each
defining an
output generator. Some of those files is maybe a template (that
you
can
even detect from the file extension), or a data file that we
currently call
a "document". They could freely mix inside the same run. I have
also
met
use case when you have a single table (single "document"), and
each
record
in it yields an output file. That can also be described in some
file
format, or really in any other way, like directly in command line
argument,
via API, etc.
- You have multiple documents without associated symbolical name
in
some
examples. Templates can't identify those then in a well
maintainable
way.
The actual file name is often not a good identifier, can change
over
time,
and you might don't even have good control over it, like you
already
receive it as a parameter from somewhere else, or someone
moves/renames
that files that you need to read. Index is also not very good,
but
I
have
written about that earlier.


On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
[email protected]> wrote:

Hi folks,

still wrapping my side around but assembled some thoughts here -

https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449

Thanks in advance,

Siegfried Goeschl


On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]>
wrote:

What you are describing is more like the angle that FMPP took
initially,
where templates drive things, they generate the output for
themselves

(even

multiple output files if they wish). By default output files name
(and
relative path) is deduced from template name. There was also a
global
data-model, built in a configuration file (or equally, built via
command
line arguments, or both mixed), from which templates get whatever
data

they

are interested in. Take a look at the figures here:
http://fmpp.sourceforge.net/qtour.html. Later, this concept was

generalized

a bit more, because you could add XML files at the same place
where
you
have the templates, and then you could associate transform
templates
to

the

XML files (based on path pattern and/or the XML document
element).
Now
that's like what freemarker-generator had initially (data files
drive
output, and the template is there to transform it).

So I think the generic mental model would like this:

1. You got files that drive the process, let's call them
*generator
files* for now. Usually, each generator file yields an output
file
(but
maybe even multiple output files, as you might saw in the last
figure).
These generator files can be of many types, like XML, JSON, XLSX
(as

in the

original freemarker-generator), and even templates (as is the
norm
in
FMPP). If the file is not a template, then you got a set of
transformer
templates (-t CLI option) in a separate directory, which can be

associated

with the generator files base on name patterns, and even based on

content

(schema usually). If the generator file is a template (so that's
a
positional @Parameter CLI argument that happens to be an *.ftl,
and
is

not

a template file specified after the "-t" option), then you just Template.process(...) it, and it prints what the output will be. 2. You also have a set of variables, the global data-model, that contains commonly useful stuff, like what you now call parameters
(CLI
-Pname=value), but also maybe data loaded from JSON, XML, etc..
Those

data

files aren't "generator files". Templates just use them if they
need

them.

An important thing here is to reuse the same mechanism to read
and

parse

those data files, which was used in templates when transforming

generator

files. So we need a common format for specifying how to load data

files.

That's maybe just FTL that #assigns to the variables, or maybe
more
declarative format.

What I have described in the original post here was a less
generic
form

of

this, as I tried to be true with the original approach. I though
the
proposal will be drastic enough as it is... :) There, the "main"
document
is the "generator file" from point 1, the "-t" template is the
transform
template for the "main" document, and the other named documents
("users",
"groups") is a poor man's shared data-model from point 2
(together
with
with -PName=value).

There's further somewhat confusing thing to get right with the list-of-documents (`DocuentList`, `NamedDocumentLists`) thing
though.
In
the model above, as per point 1, if you list multiple data files,
each

will

generate a separate output file. So, if you need take in a list
of
files

to

transform it to a single output file (or at least with a single
transform
template execution), then you have to be explicit about that, as
that's

not

the default behavior anymore. But it's still absolutely possible.
Imagine
it as a "list of XLSX-es" is itself like a file format. You need
some
CLI
(and Maven config, etc.) syntax to express that, but that
shouldn't
be a
big deal.



On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
[email protected]> wrote:

Hi Daniel,

Good timing - I was looking at a similar problem from different
angle
yesterday (see below)

Don't have enough time to answer your email in detail now - will
do
that
tomorrow evening

Thanks in advance,

Siegfried Goeschl


===. START
# FreeMarker CLI Improvement
## Support Of Multiple Template Files
Currently we support the following combinations

* Single template and no data files
* Single template and one or more data files

But we can not support the following use case which is quite
typical
in
the cloud

__Convert multiple templates with a single data file, e.g
copying a
directory of configuration files using a JSON configuration
file__

## Implementation notes
* When we copy a directory we can remove the `ftl`extension on
the
fly
* We might need an `exclude` filter for the copy operation
* Initially resolve to a list of template files and process one
after
another
* Need to calculate the output file location and extension
* We need to rename the existing command line parameters (see
below)
* Do we need multiple include and exclude filter?
* Do we need file versus directory filters?

### Command Line Options
```
--input-encoding : Encoding of the documents
--output-encoding : Encoding of the rendered template
--template-encoding : Encoding of the template
--output : Output file or directory
--include-document : Include pattern for documents
--exclude-document : Exclude pattern for documents
--include-template: Include pattern for templates
--exclude-template : Exclude pattern for templates
```

### Command Line Examples
```text
# Copy all FTL templates found in "ext/config" to the "/config"

directory

using the data from "config.json"

freemarker-cli -t ./ext/config --include-template *.ftl --o
/config

config.json

freemarker-cli --template ./ext/config --include-template *.ftl

--output

/config config.json

# Bascically the same using a named document "configuration"
# It might make sense to expose "conf" directly in the FreeMarker
data
model
# It might make sens to allow URIs for loading documents

freemarker-cli -t ./ext/config/*.ftl -o /config -d

configuration=config.json

freemarker-cli --template ./ext/config --include-template *.ftl

--output

/config --document configuration=config.json

freemarker-cli --template ./ext/config --include-template *.ftl

--output

/config --document configuration=file:///config.json

# Bascically the same using an environment variable as named
document

freemarker-cli -t ./ext/config --include-template *.ftl -o
/config
-d

configuration=env:///CONFIGURATION

freemarker-cli --template ./ext/config --include-template *.ftl

--output

/config --document configuration=env:///CONFIGURATION
```
=== END

On 23.02.2020, at 16:37, Daniel Dekany <[email protected]>
wrote:

Input documents is a fundamental concept in freemarker-generator,
so
we
should think about that more, and probably refine/rework how it's
done.

Currently it works like this, with CLI at least.

freemarker-cli
-t access-report.ftl
somewhere/foo-access-log.csv

Then in access-report.ftl you have to do something like this:

<#assign doc = Documents.get(0)>
... process doc here

(The more idiomatic Documents[0] won't work. Actually, that lead
to a

funny

chain of coincidences: It returned the string "D", then

CSVTool.parse(...)

happily parsed that to a table with the single column "D", and 0
rows,

and

as there were 0 rows, the template didn't run into an error
because
row.myExpectedColumn refers to a missing column either, so the
process
finished with success. (: Pretty unlucky for sure. The root was unintentionally breaking a FreeMarker idiom though; eventually we
will

have

to work on those too, but, different topic.)

However, actually multiple input documents can be passed in:

freemarker-cli
-t access-report.ftl
somewhere/foo-access-log.csv
somewhere/bar-access-log.csv

Above template will still work, though then you ignored all but
the

first

document. So if you expect any number of input documents, you
probably

will

have to do this:

<#list Documents.list as doc>
... process doc here
</#list>

(The more idiomatic <#list Documents as doc> won't work; but
again,

those

we will work out in a different thread.)


So, what would be better, in my opinion. I start out from what I
think

are

the common uses cases, in decreasing order of frequency. Goal is
to

make

those less error prone for the users, and simpler to express.

USE CASE 1

You have exactly 1 input documents, which is therefore simply
"the"
document in the mind of the user. This is probably the typical
use

case,

but at least the use case users typically start out from when
starting

the

work.

freemarker-cli
-t access-report.ftl
somewhere/foo-access-log.csv

Then `Documents.get(0)` is not very fitting. Most importantly
it's

error

prone, because if the user passed in more than 1 documents (can
even

happen

totally accidentally, like if the user was lazy and used a
wildcard

that

the shell exploded), the template will silently ignore the rest
of
the
documents, and the singe document processed will be practically
picked
randomly. The user might won't notice that and submits a bad
report
or

such.

I think that in this use case the document should be simply
referred
as
`Document` in the template. When you have multiple documents
there,
referring to `Document` should be an error, saying that the
template

was

made to process a single document only.


USE CASE 2

You have multiple input documents, but each has different role

(different

schema, maybe different file type). Like, you pass in users.csv
and
groups.csv. Each has difference schema, and so you want to access
them
differently, but in the same template.

freemarker-cli
[...]
--named-document users somewhere/foo-users.csv
--named-document groups somewhere/foo-groups.csv

Then in the template you could refer to them as:

`NamedDocuments.users`,

and `NamedDocuments.groups`.

Use Case 1, and 2 can be unified into a coherent concept, where

`Document`

is just a shorthand for `NamedDocuments.main`. It's called "main"

because

that's "the" document the template is about, but then you have to
added
some helper documents, with symbolic names representing their
role.

freemarker-cli
-t access-report.ftl
--document-name=main somewhere/foo-access-log.csv
--document-name=users somewhere/foo-users.csv
--document-name=groups somewhere/foo-groups.csv

Here, `Document` still works in the template, and it refers to
`somewhere/foo-access-log.csv`. (While omitting
--document-name=main

above

would be cleaner, I couldn't figure out how to do that with
Picocli.
Anyway, for now the point is the concept, which is not specific
to

CLI.)

USE CASE 3

Here you have several of the same kind of documents. That has a
more
generic sub-use-case, when you have explicitly named documents
(like
"users" above), and for some you expect multiple input files.

freemarker-cli
-t access-report.ftl
--document-name=main somewhere/foo-access-log.csv
somewhere/bar-access-log.csv
--document-name=users somewhere/foo-users.csv
somewhere/bar-users.csv
--document-name=groups somewhere/global-groups.csv

The template must to be written with this use case in mind, as
now
it

has

#list some of the documents. (I think in practice you hardly ever
want

to

get a document by hard coded index. Either you don't know how
many
documents you have, so you can't use hard coded indexes, or you
do,
and
each index has a specific meaning, but then you should name the

documents

instead, as using indexes is error prone, and hard to read.)
Accessing that list of documents in the template, maybe could be
done

like

this:
- For the "main" documents: `DocumentList`
- For explicitly named documents, like "users":

`NamedDocumentLists.users`

SUMMING UP

To unify all 3 use cases into a coherent concept:
- `NamedDocumentLists.<name>` is the most generic form, and while
you

can

achieve everything with it, using it requires your template to
handle

the

most generic case too. So, I think it would be rarely used.
- `DocumentList` is just a shorthand for
`NamedDocumentLists.main`.

It's

used if you only have one kind of documents (single format and
schema),

but

potentially multiple of them.
- `NamedDocuments.<name>` expresses that you expect exactly 1
document

of

the given name.
- `Document` is just a shorthand for `NamedDocuments.main`. This
is
for

the

most natural/frequent use case.

That's 4 possible ways of accessing your documents, which is a

trade-off

for the sake of these:
- Catching CLI (or Maven, etc.) input where the template output
likely

will

be wrong. That's only possible if the user can communicate its
intent

in

the template.
- Users don't need to deal with concepts that are irrelevant in
their
concrete use case. Just start with the trivial, `Document`, and
later

if

the need arises, generalize to named documents, document lists,
or

both.

What do guys think?





--
Best regards,
Daniel Dekany



--
Best regards,
Daniel Dekany



--
Best regards,
Daniel Dekany



--
Best regards,
Daniel Dekany




--
Best regards,
Daniel Dekany

Reply via email to