Actually not recommended but we have named data sources for less than
24
hours
Sorry, not sure what that means. Anyway, my "vote" is let's not give
automatic names if that's not recommended to utilize. I mean, in case
we
happen to agree on that, why leave it there. Especially if
automatically
chosen names can clash with explicitly given ones, that would be a
trouble. (I'm not sure right now if they can... the path we use as
the
name can be realtive? Then it realistically can.)
This is a command line tool where we have little idea what the user
will do
or abuse
No matter how much/little we know, we firmly put our bets by releasing
something. So if some feature is certainly not right, that's enough to
not
have it, I think.
How does a "data loader" knows that it is responsible to load a file
What should as "CSV data loader" should do - parse it into a list of
records or stream one by one?
I think I was misunderstood here. It's not about some kind of
auto-magic.
It's about where do you specify what to load and how, and in what
format do
you specify that. Of course, you must specify the data source
(basically an
URI for now as I saw), the rough format (CSV), and the format options
(separator character, etc.), and other freemarker-generator loading
options
(like which CSV columns are numbers, which are dates, with what
format,
what counts as null, etc.).
What was confusing in what I said much earlier is probably that you
don't
need a global "--mode". That just means that you can have multiple
"modes"
in the same run, not that you need some big auto-magic. And that they
aren't really "modes" then... I think it's just natural that you can
have
different kind of "output generator" files in the same run. Why force
the
assumption that you don't, especially considering that they will might
want
to access common data (which you don't want to load again and again,
for
each run of the different --mode-s you need). Of course, as you might
select files with wildcards (or by specifying a whole directory, or
with
some Maven matcher), you just can't directly associate the data loader
options to the individual data sources. Instead you can say elsewhere
that
*.csv inside this explicit "group", or with this file name pattern, is
to
be loaded like this. That's what you might perceived as auto-magic.
It's
just mass-producing data loaders for "cattle" files.
How to handle the case if you have multiple potential data loaders for
a
single file?
As per above, that's just two data loaders referring to the same data
source, so, nothing special.
As of the current state of things, this is how I'm supposed to load a
CSV,
in the template itself (if I'm not outdated/mistaken):
<#assign cvsFormat = CSVTool.formats.DEFAULT.withHeader()>
<#assign foos = CSVTool.parse(Datasources.get("foos"),
cvsFormat).records>
<#assign bars = CSVTool.parse(Datasources.get("barb"),
cvsFormat).records>
It will worth exploring how to make these look more "idiomatic" FTL
(given
this is an "official" FM product now, I think, we should show how it's
done), and nicer in general. Point for now is, that's basically two
data-loaders interwoven with the template there. Because they are
interwoven like that, you can't reuse what they loaded for another
template
execution.
That's comes down to personal preferences, e.g. chown uses
"owner[:group] "
Yeah, but XML namespaces, Java, C, etc. all use
<parent><operator><child>,
so, I think, that clicks for more of our potential users. So let's bet
on
what clicks for more users.
Besides, I challenged the very idea that we need both groups and
names. :)
Saying that it's simpler and less opinioned (more flexible) to have
just
multiple names (like tags). What's the end of that?
On Sun, Mar 1, 2020 at 9:47 AM Siegfried Goeschl <
[email protected]> wrote:
HI Daniel,
Please see my comments below
Thanks in advance,
Siegfried Goeschl
On 29.02.2020, at 21:02, Daniel Dekany <[email protected]>
wrote:
I try to provide a useful name even when the content is coming from
an
URL
When is it recommended to rely on that though? Because utilizing
that
means
that renaming a data source file can break the process, even if you
call
freemarker-cli with the up to date file name. And if that happens
depends
on what you (or an other random colleague!) have dug inside the
templates.
So I guess we better just don't support this. Less code and less
things
to
document too.
Actually not recommended but we have named data sources for less than
24
hours
I think we have a different understanding what a "Document" /
"Datasource
/ DataSource" should do
Thing is, eventually (most certainly pre-1.0, as it influences
architecture), certain needs will have to addressed, somehow. Then
we
will
see what "things" we really need. For now I though we need "things"
that
are much more than paths, and encapsulate the "how to load the data"
aspect. I called them data sources, but maybe we should called them
"data
loaders" to free up data sources for the more primitive thing. Some
needs/doubts to address, *later*: Is it really the best approach for
users
to load/parse data sources programmatically (that coded is written
in
FTL,
inside the templates)? Also, is the template the right place for
doing
that, because, when multiple templates (or just multiple template
*runs*
of
the same template, each generating a different output file) needs
common
data, they shouldn't load it again and again. Also, different topic,
can
we
handle the case "transparently" enough when the data is not coming
from a
file?
This is a command line tool where we have little idea what the user
will
do or abuse
* How does a "data loader" knows that it is responsible to load a
file
* What should as "CSV data loader" should do - parse it into a list
of
records or stream one by one?
* How to handle the case if you have multiple potential data loaders
for a
single file?
I'm leaning towards building blocks where the user controls the work
to be
done even it requires one to two extra lines of FTL code
The joy of programming - I did not intend to use "name:group"
together
with
wildcards :-)
For a CLI tool, I guess we agree that it should work. So maybe, like
this
(here logs and foos meant to be "groups"):
--data-source logs file1.log file2.log fileN.log --data-source
foos
foo1.csv foo2.csv fooN.csv --data-source bar bar.xlsx
It so happens that here you don't really have a good control about
the
number of files associated to the name, so, maybe yet another reason
to
not
differentiate names and groups.
I Disagree here - I think using a name would be used more often. I
added
the "group" as an afterthought since some grouping could be useful
We do agree in that. What I said is that the *syntax* should be so
that
the
group comes first. It's still optional. Like this:
--data-source group:name /somewhere
--data-source name /somewhere
That's comes down to personal preferences, e.g. chown uses
"owner[:group] "
On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
[email protected]> wrote:
HI Daniel,
Seem my comments below
Thanks in advance,
Siegfried Goeschl
On 29.02.2020, at 19:08, Daniel Dekany <[email protected]>
wrote:
FREEMARKER-135 freemarker-generator-cli: Support user-supplied
names
for
datasources
So, I can do this to have both a name an a group associated to a
data
source:
--datasource someName:someGroup=somewhere/something
Correct
Or if I only want a name, but not a group (or an "" group
actually -
bug?), then:
--datasource someName=somewhere/something
Correct
Or if only a group but not a name (or a "" name actually) then:
--datasource :someGroup=somewhere/something
Mhmm, that would be unintended functionality from my side - current
approach is that every "Document" / "Datasource / DataSource" is
named
A name must identify exactly 1 data source, while a group
identifies a
list
of data sources.
No, every "Document" / "Datasource / DataSource" has a name
currently
but
uniqueness is not enforced. Only if you want to get a "Document" /
"Datasource / DataSource" with it's exact name I checked for
exactly one
search hit and throw an exception. I try to provide a useful name
even
when
the content is coming from an URL or STDIN (and I will probably add
environment variables as "Document" / "Datasource / DataSource",
e.g
configuration in the cloud as JSON content passed as environment
variable)
Is that this idea, that the a data source can be part of a group,
and
then
is also possibly identifiable with a name comes from an use case?
I
mean,
it's possibly important somewhere, but if so, then it's strange
that
you
can put something into only a single group. If we need this kind
of
thing,
then perhaps you should be just allowed to associate the data
source
with a
list of names (kind of like tagging), and then when the template
wants
to
get something by name, it will tell there if it expects exactly
one or
a
list of data sources. Then you don't need to introduce two terms
in the
documentation either (names and groups). Again, if we want this at
all,
instead of just going with a data source that itself gives a list.
(And
if
not, how will we handle a data source that loads from a non-file
source?)
I actually thought of implementing tagging but considered a "group"
sufficient.
* If you don't define anything everything goes into the "default"
group
* For individual documents you can define a name and an optional
group
I think we have a different understanding what a "Document" /
"Datasource
/ DataSource" should do
* It is a dumb
* It is lazy since data is only loaded on demand
* There is no automagic like "oh, this is a JSON file, so let's go
to
the
JSON tool and create a map readily accessible in the data model"
Note that the current command line syntax doesn't work well with
shell
wildcard expansion. Like this:
--datasource :someGroup=logs/*.log
will try to expand ":someGroup=logs/*.log", and because it finds
nothing
(and because the rules of sh and the like is a mess), you will get
the
parameter value as is, without * expanded.
The joy of programming - I did not intend to use "name:group"
together
with wildcards :-)
Also, I think the syntax with colon should be flipped, because on
other
places foo:bar usually means that foo is the bigger unit (the
container),
and bar is the smaller unit (the child).
I Disagree here - I think using a name would be used more often. I
added
the "group" as an afterthought since some grouping could be useful
On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
[email protected]> wrote:
Hi Daniel,
I'm an enterprise developer - bad habits die hard :-)
So I closed the following tickets and merged the branches
1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli"
into
"freemarker-generator"
2) FREEMARKER-134 freemarker-generator: Rename "Document" to
"Datasource"
3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied
names
for datasources
Thanks in advance,
Siegfried Goeschl
On 29.02.2020, at 12:19, Daniel Dekany <[email protected]>
wrote:
Yeah, and of course, you can merge that branch. You can even
work on
the
master directly after all.
On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
[email protected]>
wrote:
But, I do recognize the cattle use case (several "faceless"
files
with
common format/schema). Only, my idea is to push that complexity
on
the
data
source. The "data source" concept shields the rest of the
application
from
the details of how the data is stored or retrieved. So, a data
source
might
loads a bunch of log files from a directory, and present them
as a
single
big table, or like a list of tables, etc. So I want to deal
with the
cattle
use case, but the question is what part of the of architecture
will
deal
with this complication, with other words, how do you box
things. Why
my
initial bet is to stuff that complication into the "data
source"
implementation(s) is that data sources are inherently varied.
Some
returns
a table-like thing, some have multiple named tables (worksheets
in
Excel),
some returns tree of nodes (XML), etc. So then, some might
returns a
list-of-list-of log records, or just a single list of
log-records
(put
together from daily log files). That way cattles don't add to
conceptual
complexity. Now, you might be aware of cases where the cattle
concept
must
be more exposed than this, and the we can't box things like
this.
But
this
is what I tried to express.
Regarding "output generators", and how that applies on the
command
line. I
think it's important that the common core between Maven and
command-line is
as fat as possible. Ideally, they are just two syntax to set up
the
same
thing. Mostly at least. So, if you specify a template file to
the
CLI
application, in a way so that it causes it to process that
template
to
generate a single output, then there you have just defined an
"output
generator" (even if it wasn't explicitly called like that in
the
command
line). If you specify 3 csv files to the CLI application, in a
way
so
that
it causes it to generate 3 output files, then you have just
defined
3
"output generators" there (there's at least one template
specified
there
too, but that wasn't an "output generator" itself, it was just
an
attribute
of the 3 output generators). If you specify 1 template, and 3
csv
files, in
a way so that it will yield 4 output files (1 for the template,
3
for
the
csv-s), then you have defined 4 output generators there. If you
have a
data
source that loads a list of 3 entities (say, 3 csv files, so
it's a
list of
tables then), and you have 2 templates, and you tell the CLI to
execute
each template for each item in said data source, then you have
just
defined
6 "output generators".
On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
[email protected]> wrote:
Hi Daniel,
That all depends on your mental model and work you do,
expectations,
experience :-)
__Document Handling__
*"But I think actually we have no good use case for list of
documents
that's passed at once to a single template run, so, we can
just
ignore
that complication"*
In my case that's not a complication but my daily business -
I'm
regularly wading through access logs - yesterday probably a
couple
of
hundreds access logs across two staging sites to help tracking
some
strange API gateway issues :-)
My gut feeling is (borrowing from
https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
)
1. You have a few lovely named documents / templates - `pets`
2. You have tons of anonymous documents / templates to process
-
`cattle`
3. The "grey area" comes into play when mixing `pets & cattle`
`freemarker-cli` was built with 2) in mind and I want to cover
1)
since
it is equally important and common.
__Template And Document Processing Modes__
IMHO it is important to answer the following question : "How
many
outputs do you get when rendering 2 template and 3
datasources?
Two,
Three or Six?"
Your answer is influenced by your mental model / experience
* When wading through tons of CSV files, access logs, etc. the
answer
is
"2"
* When doing source code generation the obvious answer is "6"
* Can't image a use case which results in "3" but I'm pretty
sure
we
will encounter one
__Template and document mode probably shouldn't exist__
That's hard for me to fully understand - I definitely lack
your
insights
& experience writing such tools :-)
Defining the `Output Generator` is the underlying model for
the
Maven
plugin (and probably FMPP).
I'm not sure if this applies for command lines at least not in
the
way
I
use them (or would like to use them)
Thanks in advance,
Siegfried Goeschl
PS: Can/shall I merge the PR to bring in `freemarker-cli`?
On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
Yeah, "data source" is surely a too popular name, but for
reason.
Anyone
has other ideas?
As of naming data sources and such. One thing I was wondering
about
back
then is how to deal with list of documents given to a
template,
versus
exactly 1 document given to a template. But I think actually
we
have
no
good use case for list of documents that's passed at once to
a
single
template run, so, we can just ignore that complication. A
document
has
a
name, and that's always just a single document, not a
collection,
as
far as
the template is concerned. (We can have multiple documents
per
run,
but
those normally yield separate output generators, so it's
still
only
one
document per template.) However, we can have data source
types
(document
types with old terminology) that collect together multiple
data
files.
So
then that complexity is encapsulated into the data source
type,
and
doesn't
complicate the overall architecture. That's another case when
a
data
source
is not just a file. Like maybe there's a data source type
that
loads
all
the CSV-s from a directory, into a single big table (I had
such
case),
or
even into a list of tables. Or, as I mentioned already, a
data
source
is
maybe an SQL query on a JDBC data source (and we got the
first
term
clash... JDBC also call them data sources).
Template and document mode probably shouldn't exist from user
perspective
either, at least not as a global option that must apply to
everything
in a
run. They could just give the files that define the "output
generators",
and some of them will be templates, some of them are data
files,
in
which
case a template need to be associated with them (and there
can be
a
couple
of ways of doing that). And then again, there are the cases
where
you
want
to create one output generator per entity from some data
source.
On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
[email protected]> wrote:
Hi Daniel,
See my comments below - and thanks for your patience and
input
:-)
*Renaming Document To DataSource*
Yes, makes sense. I tried to avoid since I'm using
javax.activation
and
its DataSource.
*Template And Document Mode*
Agreed - I think it is a valuable abstraction for the user
but it
is
not
an implementation concept :-)
*Document Without Symbolic Names*
Also agreed and it is going to change but I have not settled
my
mind
yet
what exactly to implement.
Thanks in advance,
Siegfried Goeschl
On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
A few quick thoughts on that:
- We should replace the "document" term with something more
speaking.
It
doesn't tell that it's some kind of input. Also, most of
these
inputs
aren't something that people typically call documents. Like
a csv
file, or
a database table, which is not even a file (OK we don't
support
such
thing
at the moment). I think, maybe "data source" is a safe
enough
term.
(It
also rhymes with data model.)
- You have separate "template" and "document" "mode", that
applies
to
a
whole run. I think such specialization won't be helpful. We
could
just say,
on the conceptual level at lest, that we need a set of
"outputs
generators". An output generator is an object (in the API)
that
specifies a
template, a data-model (where the data-model is possibly
populated
with
"documents"), and an output "sink" (a file path, or stdout),
and
can
generate the output itself. A practical way of defining the
output
generators in a CLI application is via a bunch of files,
each
defining an
output generator. Some of those files is maybe a template
(that
you
can
even detect from the file extension), or a data file that we
currently call
a "document". They could freely mix inside the same run. I
have
also
met
use case when you have a single table (single "document"),
and
each
record
in it yields an output file. That can also be described in
some
file
format, or really in any other way, like directly in command
line
argument,
via API, etc.
- You have multiple documents without associated symbolical
name
in
some
examples. Templates can't identify those then in a well
maintainable
way.
The actual file name is often not a good identifier, can
change
over
time,
and you might don't even have good control over it, like you
already
receive it as a parameter from somewhere else, or someone
moves/renames
that files that you need to read. Index is also not very
good,
but
I
have
written about that earlier.
On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
[email protected]> wrote:
Hi folks,
still wrapping my side around but assembled some thoughts
here -
https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
Thanks in advance,
Siegfried Goeschl
On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]>
wrote:
What you are describing is more like the angle that FMPP
took
initially,
where templates drive things, they generate the output for
themselves
(even
multiple output files if they wish). By default output files
name
(and
relative path) is deduced from template name. There was also
a
global
data-model, built in a configuration file (or equally, built
via
command
line arguments, or both mixed), from which templates get
whatever
data
they
are interested in. Take a look at the figures here:
http://fmpp.sourceforge.net/qtour.html. Later, this concept
was
generalized
a bit more, because you could add XML files at the same
place
where
you
have the templates, and then you could associate transform
templates
to
the
XML files (based on path pattern and/or the XML document
element).
Now
that's like what freemarker-generator had initially (data
files
drive
output, and the template is there to transform it).
So I think the generic mental model would like this:
1. You got files that drive the process, let's call them
*generator
files* for now. Usually, each generator file yields an
output
file
(but
maybe even multiple output files, as you might saw in the
last
figure).
These generator files can be of many types, like XML, JSON,
XLSX
(as
in the
original freemarker-generator), and even templates (as is
the
norm
in
FMPP). If the file is not a template, then you got a set of
transformer
templates (-t CLI option) in a separate directory, which can
be
associated
with the generator files base on name patterns, and even
based on
content
(schema usually). If the generator file is a template (so
that's
a
positional @Parameter CLI argument that happens to be an
*.ftl,
and
is
not
a template file specified after the "-t" option), then you
just
Template.process(...) it, and it prints what the output will
be.
2. You also have a set of variables, the global data-model,
that
contains commonly useful stuff, like what you now call
parameters
(CLI
-Pname=value), but also maybe data loaded from JSON, XML,
etc..
Those
data
files aren't "generator files". Templates just use them if
they
need
them.
An important thing here is to reuse the same mechanism to
read
and
parse
those data files, which was used in templates when
transforming
generator
files. So we need a common format for specifying how to load
data
files.
That's maybe just FTL that #assigns to the variables, or
maybe
more
declarative format.
What I have described in the original post here was a less
generic
form
of
this, as I tried to be true with the original approach. I
though
the
proposal will be drastic enough as it is... :) There, the
"main"
document
is the "generator file" from point 1, the "-t" template is
the
transform
template for the "main" document, and the other named
documents
("users",
"groups") is a poor man's shared data-model from point 2
(together
with
with -PName=value).
There's further somewhat confusing thing to get right with
the
list-of-documents (`DocuentList`, `NamedDocumentLists`)
thing
though.
In
the model above, as per point 1, if you list multiple data
files,
each
will
generate a separate output file. So, if you need take in a
list
of
files
to
transform it to a single output file (or at least with a
single
transform
template execution), then you have to be explicit about
that, as
that's
not
the default behavior anymore. But it's still absolutely
possible.
Imagine
it as a "list of XLSX-es" is itself like a file format. You
need
some
CLI
(and Maven config, etc.) syntax to express that, but that
shouldn't
be a
big deal.
On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
[email protected]> wrote:
Hi Daniel,
Good timing - I was looking at a similar problem from
different
angle
yesterday (see below)
Don't have enough time to answer your email in detail now -
will
do
that
tomorrow evening
Thanks in advance,
Siegfried Goeschl
===. START
# FreeMarker CLI Improvement
## Support Of Multiple Template Files
Currently we support the following combinations
* Single template and no data files
* Single template and one or more data files
But we can not support the following use case which is quite
typical
in
the cloud
__Convert multiple templates with a single data file, e.g
copying a
directory of configuration files using a JSON configuration
file__
## Implementation notes
* When we copy a directory we can remove the `ftl`extension
on
the
fly
* We might need an `exclude` filter for the copy operation
* Initially resolve to a list of template files and process
one
after
another
* Need to calculate the output file location and extension
* We need to rename the existing command line parameters
(see
below)
* Do we need multiple include and exclude filter?
* Do we need file versus directory filters?
### Command Line Options
```
--input-encoding : Encoding of the documents
--output-encoding : Encoding of the rendered template
--template-encoding : Encoding of the template
--output : Output file or directory
--include-document : Include pattern for documents
--exclude-document : Exclude pattern for documents
--include-template: Include pattern for templates
--exclude-template : Exclude pattern for templates
```
### Command Line Examples
```text
# Copy all FTL templates found in "ext/config" to the
"/config"
directory
using the data from "config.json"
freemarker-cli -t ./ext/config --include-template *.ftl --o
/config
config.json
freemarker-cli --template ./ext/config --include-template
*.ftl
--output
/config config.json
# Bascically the same using a named document "configuration"
# It might make sense to expose "conf" directly in the
FreeMarker
data
model
# It might make sens to allow URIs for loading documents
freemarker-cli -t ./ext/config/*.ftl -o /config -d
configuration=config.json
freemarker-cli --template ./ext/config --include-template
*.ftl
--output
/config --document configuration=config.json
freemarker-cli --template ./ext/config --include-template
*.ftl
--output
/config --document configuration=file:///config.json
# Bascically the same using an environment variable as named
document
freemarker-cli -t ./ext/config --include-template *.ftl -o
/config
-d
configuration=env:///CONFIGURATION
freemarker-cli --template ./ext/config --include-template
*.ftl
--output
/config --document configuration=env:///CONFIGURATION
```
=== END
On 23.02.2020, at 16:37, Daniel Dekany <[email protected]>
wrote:
Input documents is a fundamental concept in
freemarker-generator,
so
we
should think about that more, and probably refine/rework how
it's
done.
Currently it works like this, with CLI at least.
freemarker-cli
-t access-report.ftl
somewhere/foo-access-log.csv
Then in access-report.ftl you have to do something like
this:
<#assign doc = Documents.get(0)>
... process doc here
(The more idiomatic Documents[0] won't work. Actually, that
lead
to a
funny
chain of coincidences: It returned the string "D", then
CSVTool.parse(...)
happily parsed that to a table with the single column "D",
and 0
rows,
and
as there were 0 rows, the template didn't run into an error
because
row.myExpectedColumn refers to a missing column either, so
the
process
finished with success. (: Pretty unlucky for sure. The root
was
unintentionally breaking a FreeMarker idiom though;
eventually we
will
have
to work on those too, but, different topic.)
However, actually multiple input documents can be passed in:
freemarker-cli
-t access-report.ftl
somewhere/foo-access-log.csv
somewhere/bar-access-log.csv
Above template will still work, though then you ignored all
but
the
first
document. So if you expect any number of input documents,
you
probably
will
have to do this:
<#list Documents.list as doc>
... process doc here
</#list>
(The more idiomatic <#list Documents as doc> won't work; but
again,
those
we will work out in a different thread.)
So, what would be better, in my opinion. I start out from
what I
think
are
the common uses cases, in decreasing order of frequency.
Goal is
to
make
those less error prone for the users, and simpler to
express.
USE CASE 1
You have exactly 1 input documents, which is therefore
simply
"the"
document in the mind of the user. This is probably the
typical
use
case,
but at least the use case users typically start out from
when
starting
the
work.
freemarker-cli
-t access-report.ftl
somewhere/foo-access-log.csv
Then `Documents.get(0)` is not very fitting. Most
importantly
it's
error
prone, because if the user passed in more than 1 documents
(can
even
happen
totally accidentally, like if the user was lazy and used a
wildcard
that
the shell exploded), the template will silently ignore the
rest
of
the
documents, and the singe document processed will be
practically
picked
randomly. The user might won't notice that and submits a bad
report
or
such.
I think that in this use case the document should be simply
referred
as
`Document` in the template. When you have multiple documents
there,
referring to `Document` should be an error, saying that the
template
was
made to process a single document only.
USE CASE 2
You have multiple input documents, but each has different
role
(different
schema, maybe different file type). Like, you pass in
users.csv
and
groups.csv. Each has difference schema, and so you want to
access
them
differently, but in the same template.
freemarker-cli
[...]
--named-document users somewhere/foo-users.csv
--named-document groups somewhere/foo-groups.csv
Then in the template you could refer to them as:
`NamedDocuments.users`,
and `NamedDocuments.groups`.
Use Case 1, and 2 can be unified into a coherent concept,
where
`Document`
is just a shorthand for `NamedDocuments.main`. It's called
"main"
because
that's "the" document the template is about, but then you
have to
added
some helper documents, with symbolic names representing
their
role.
freemarker-cli
-t access-report.ftl
--document-name=main somewhere/foo-access-log.csv
--document-name=users somewhere/foo-users.csv
--document-name=groups somewhere/foo-groups.csv
Here, `Document` still works in the template, and it refers
to
`somewhere/foo-access-log.csv`. (While omitting
--document-name=main
above
would be cleaner, I couldn't figure out how to do that with
Picocli.
Anyway, for now the point is the concept, which is not
specific
to
CLI.)
USE CASE 3
Here you have several of the same kind of documents. That
has a
more
generic sub-use-case, when you have explicitly named
documents
(like
"users" above), and for some you expect multiple input
files.
freemarker-cli
-t access-report.ftl
--document-name=main somewhere/foo-access-log.csv
somewhere/bar-access-log.csv
--document-name=users somewhere/foo-users.csv
somewhere/bar-users.csv
--document-name=groups somewhere/global-groups.csv
The template must to be written with this use case in mind,
as
now
it
has
#list some of the documents. (I think in practice you hardly
ever
want
to
get a document by hard coded index. Either you don't know
how
many
documents you have, so you can't use hard coded indexes, or
you
do,
and
each index has a specific meaning, but then you should name
the
documents
instead, as using indexes is error prone, and hard to read.)
Accessing that list of documents in the template, maybe
could be
done
like
this:
- For the "main" documents: `DocumentList`
- For explicitly named documents, like "users":
`NamedDocumentLists.users`
SUMMING UP
To unify all 3 use cases into a coherent concept:
- `NamedDocumentLists.<name>` is the most generic form, and
while
you
can
achieve everything with it, using it requires your template
to
handle
the
most generic case too. So, I think it would be rarely used.
- `DocumentList` is just a shorthand for
`NamedDocumentLists.main`.
It's
used if you only have one kind of documents (single format
and
schema),
but
potentially multiple of them.
- `NamedDocuments.<name>` expresses that you expect exactly
1
document
of
the given name.
- `Document` is just a shorthand for `NamedDocuments.main`.
This
is
for
the
most natural/frequent use case.
That's 4 possible ways of accessing your documents, which is
a
trade-off
for the sake of these:
- Catching CLI (or Maven, etc.) input where the template
output
likely
will
be wrong. That's only possible if the user can communicate
its
intent
in
the template.
- Users don't need to deal with concepts that are irrelevant
in
their
concrete use case. Just start with the trivial, `Document`,
and
later
if
the need arises, generalize to named documents, document
lists,
or
both.
What do guys think?
--
Best regards,
Daniel Dekany
--
Best regards,
Daniel Dekany
--
Best regards,
Daniel Dekany
--
Best regards,
Daniel Dekany
--
Best regards,
Daniel Dekany