Re: freemarker-generator: Improving the input documents concept

Daniel Dekany Thu, 05 Mar 2020 13:37:21 -0800

>
> Regarding the "global mode" and "output generators files" - I'm sorry, but
> I'm not getting it



I'm not getting what doesn't go though. Can you explain?  The CLI suggested
that you got "global mode" (a single --mode switch per run).

Do you think of defining explicit "output generator file" containing
> `datasources, `templates` and `outputs` - yes that could be done but
> does not feel like an interactive command line tool any longer


I think what the CLI exposes and how should be a secondary detail at this
phase, as the CLI is (or should be) just a front end, that wraps the common
core (genertor.base). The CLI, the Maven task, Gradle task, etc. should
probably just be thin wrappers around the common core. Do we agree on that?
So, these concepts are "core" concepts, and probably govern the API of
generator.base. That's was my intent here, to hammer out these core
concepts.

Also the "output generator file" is usually just a data file, or just a
template. It's just the file that causes some output generated. So,usually
it doesn't *explicitly* contain all that information (though you might as
well introduce a file type that does). But it still defines an output
generator, because, you will have a template, a data-model, and an output
file name.

I think you are leaning towards a 1.0 release why I favour 0.x.y to
> have room to make mistakes / experiments


The version number doesn't tell much to me, so what's your intent/strategy
with these 0.x.y releases you plant to do? Like, if you release 0.1.0, then
will you feel inconvenient to change things *radically* after that? That
can be a problem, if the goal is iterating without bounds. On the other
hand, if you don't feel inconvenient about that at all, I don't really see
why a user would use it. But, if it's clearly indicated that everything can
change, and you think it's useful to release that way, I don't want to be
in your way.

perfect is the enemy of good


I just think the overall concept/architecture should be iterated out first.
Polish, and adding all kind of bells, even fixing bugs, is different matter.

On Thu, Mar 5, 2020 at 9:36 PM Siegfried Goeschl <
[email protected]> wrote:

> Hi Daniel,
>
> The introduction of named `Datasource` allows to simplify / streamline a
> few things
>
> * I have a meaningful user-supplied name
> * I can pass additional configuration information as already implemented
> with `charset` and `contenttype` and this would also allow configure a
> `CSV Datasource`, e.g.
> `users=./data/users.csv#format=default&header=true&delimeter=TAB` which
> can be readily parses
> * Currently the name of datasources are are taken from their relative
> file name - might make sense to drop that but I need to contemplate :-)
>
> Regarding the "global mode" and "output generators files" - I'm sorry,
> but I'm not getting it
>
> * I refined the
> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 to
> make my points more clearly
> * Do you think of defining explicit "output generator file" containing
> `datasources, `templates` and `outputs` - yes that could be done but
> does not feel like an interactive command line tool any longer
>
>
> Regarding "more idiomatic FTL usage"
>
> * Yes, I need to dive into custom template models or whatever it is
> called :-)
>
>
> Something we need to iron out is a release policy
>
> * Currently we have little agreement how the CLI should look like or
> behave
> * I think you are leaning towards a 1.0 release why I favour 0.x.y to
> have room to make mistakes / experiments
> * I personally see the possibility that we don't get a release out -
> "perfect is the enemy of good"
>
> How would you like to handle the problem - can we agree on minimal
> feature set worthy a release?
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> On 1 Mar 2020, at 11:33, Daniel Dekany wrote:
>
> >>
> >> Actually not recommended but we have named data sources for less than
> >> 24
> >> hours
> >
> >
> > Sorry, not sure what that means. Anyway, my "vote" is let's not give
> > automatic names if that's not recommended to utilize. I mean, in case
> > we
> > happen to agree on that, why leave it there. Especially if
> > automatically
> > chosen names can clash with explicitly given ones, that would be a
> > trouble.  (I'm not sure right now if they can... the path we use as
> > the
> > name can be realtive? Then it realistically can.)
> >
> > This is a command line tool where we have little idea what the user
> > will do
> >> or abuse
> >
> >
> > No matter how much/little we know, we firmly put our bets by releasing
> > something. So if some feature is certainly not right, that's enough to
> > not
> > have it, I think.
> >
> > How does a "data loader" knows that it is responsible to load a file
> >
> > What should as "CSV data loader" should do - parse it into a list of
> >> records or stream one by one?
> >
> >
> > I think I was misunderstood here. It's not about some kind of
> > auto-magic.
> > It's about where do you specify what to load and how, and in what
> > format do
> > you specify that. Of course, you must specify the data source
> > (basically an
> > URI for now as I saw), the rough format (CSV), and the format options
> > (separator character, etc.), and other freemarker-generator loading
> > options
> > (like which CSV columns are numbers, which are dates, with what
> > format,
> > what counts as null, etc.).
> >
> > What was confusing in what I said much earlier is probably that you
> > don't
> > need a global "--mode". That just means that you can have multiple
> > "modes"
> > in the same run, not that you need some big auto-magic. And that they
> > aren't really "modes" then... I think it's just natural that you can
> > have
> > different kind of "output generator" files in the same run. Why force
> > the
> > assumption that you don't, especially considering that they will might
> > want
> > to access common data (which you don't want to load again and again,
> > for
> > each run of the different --mode-s you need). Of course, as you might
> > select files with wildcards (or by specifying a whole directory, or
> > with
> > some Maven matcher), you just can't directly associate the data loader
> > options to the individual data sources. Instead you can say elsewhere
> > that
> > *.csv inside this explicit "group", or with this file name pattern, is
> > to
> > be loaded like this. That's what you might perceived as auto-magic.
> > It's
> > just mass-producing data loaders for "cattle" files.
> >
> > How to handle the case if you have multiple potential data loaders for
> > a
> >> single file?
> >
> >
> > As per above, that's just two data loaders referring to the same data
> > source, so, nothing special.
> >
> > As of the current state of things, this is how I'm supposed to load a
> > CSV,
> > in the template itself (if I'm not outdated/mistaken):
> >
> > <#assign cvsFormat = CSVTool.formats.DEFAULT.withHeader()>
> > <#assign foos = CSVTool.parse(Datasources.get("foos"),
> > cvsFormat).records>
> > <#assign bars = CSVTool.parse(Datasources.get("barb"),
> > cvsFormat).records>
> >
> > It will worth exploring how to make these look more "idiomatic" FTL
> > (given
> > this is an "official" FM product now, I think, we should show how it's
> > done), and nicer in general. Point for now is, that's basically two
> > data-loaders interwoven with the template there. Because they are
> > interwoven like that, you can't reuse what they loaded for another
> > template
> > execution.
> >
> > That's comes down to personal preferences, e.g. chown uses
> > "owner[:group] "
> >
> >
> > Yeah, but XML namespaces, Java, C, etc. all use
> > <parent><operator><child>,
> > so, I think, that clicks for more of our potential users. So let's bet
> > on
> > what clicks for more users.
> >
> > Besides, I challenged the very idea that we need both groups and
> > names. :)
> > Saying that it's simpler and less opinioned (more flexible) to have
> > just
> > multiple names (like tags). What's the end of that?
> >
> > On Sun, Mar 1, 2020 at 9:47 AM Siegfried Goeschl <
> > [email protected]> wrote:
> >
> >> HI Daniel,
> >>
> >> Please see my comments below
> >>
> >> Thanks in advance,
> >>
> >> Siegfried Goeschl
> >>
> >>
> >>> On 29.02.2020, at 21:02, Daniel Dekany <[email protected]>
> >>> wrote:
> >>>
> >>>>
> >>>> I try to provide a useful name even when the content is coming from
> >>>> an
> >>>> URL
> >>>
> >>>
> >>> When is it recommended to rely on that though? Because utilizing
> >>> that
> >> means
> >>> that renaming a data source file can break the process, even if you
> >>> call
> >>> freemarker-cli with the up to date file name. And if that happens
> >>> depends
> >>> on what you (or an other random colleague!) have dug inside the
> >> templates.
> >>> So I guess we better just don't support this. Less code and less
> >>> things
> >> to
> >>> document too.
> >>>
> >>
> >> Actually not recommended but we have named data sources for less than
> >> 24
> >> hours
> >>
> >>>
> >>>> I think we have a different understanding what a "Document" /
> >> "Datasource
> >>>> / DataSource" should do
> >>>
> >>>
> >>> Thing is, eventually (most certainly pre-1.0, as it influences
> >>> architecture), certain needs will have to addressed, somehow. Then
> >>> we
> >> will
> >>> see what "things" we really need. For now I though we need "things"
> >>> that
> >>> are much more than paths, and encapsulate the "how to load the data"
> >>> aspect. I called them data sources, but maybe we should called them
> >>> "data
> >>> loaders" to free up data sources for the more primitive thing. Some
> >>> needs/doubts to address, *later*: Is it really the best approach for
> >> users
> >>> to load/parse data sources programmatically (that coded is written
> >>> in
> >> FTL,
> >>> inside the templates)? Also, is the template the right place for
> >>> doing
> >>> that, because, when multiple templates (or just multiple template
> >>> *runs*
> >> of
> >>> the same template, each generating a different output file) needs
> >>> common
> >>> data, they shouldn't load it again and again. Also, different topic,
> >>> can
> >> we
> >>> handle the case "transparently" enough when the data is not coming
> >>> from a
> >>> file?
> >>
> >> This is a command line tool where we have little idea what the user
> >> will
> >> do or abuse
> >>
> >> * How does a "data loader" knows that it is responsible to load a
> >> file
> >> * What should as "CSV data loader" should do - parse it into a list
> >> of
> >> records or stream one by one?
> >> * How to handle the case if you have multiple potential data loaders
> >> for a
> >> single file?
> >>
> >> I'm leaning towards building blocks where the user controls the work
> >> to be
> >> done even it requires one to two extra lines of FTL code
> >>
> >>
> >>>
> >>> The joy of programming - I did not intend to use "name:group"
> >>> together
> >> with
> >>>> wildcards :-)
> >>>
> >>>
> >>> For a CLI tool, I guess we agree that it should work. So maybe, like
> >>> this
> >>> (here logs and foos meant to be "groups"):
> >>> --data-source logs file1.log file2.log fileN.log   --data-source
> >>> foos
> >>> foo1.csv foo2.csv fooN.csv  --data-source bar bar.xlsx
> >>>
> >>> It so happens that here you don't really have a good control about
> >>> the
> >>> number of files associated to the name, so, maybe yet another reason
> >>> to
> >> not
> >>> differentiate names and groups.
> >>>
> >>> I Disagree here - I think using a name would be used more often. I
> >>> added
> >>>> the "group" as an afterthought since some grouping could be useful
> >>>
> >>>
> >>> We do agree in that. What I said is that the *syntax* should be so
> >>> that
> >> the
> >>> group comes first. It's still optional. Like this:
> >>> --data-source group:name /somewhere
> >>> --data-source name /somewhere
> >>
> >> That's comes down to personal preferences, e.g. chown uses
> >> "owner[:group] "
> >>
> >>>
> >>> On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
> >>> [email protected]> wrote:
> >>>
> >>>> HI Daniel,
> >>>>
> >>>> Seem my comments below
> >>>>
> >>>> Thanks in advance,
> >>>>
> >>>> Siegfried Goeschl
> >>>>
> >>>>
> >>>>> On 29.02.2020, at 19:08, Daniel Dekany <[email protected]>
> >> wrote:
> >>>>>
> >>>>> FREEMARKER-135 freemarker-generator-cli: Support user-supplied
> >>>>> names
> >> for
> >>>>> datasources
> >>>>>
> >>>>> So, I can do this to have both a name an a group associated to a
> >>>>> data
> >>>>> source:
> >>>>> --datasource someName:someGroup=somewhere/something
> >>>>
> >>>> Correct
> >>>>
> >>>>> Or if I only want a name, but not a group (or an ""  group
> >>>>> actually -
> >>>>> bug?), then:
> >>>>> --datasource someName=somewhere/something
> >>>>
> >>>> Correct
> >>>>
> >>>>>
> >>>>> Or if only a group but not a name (or a "" name actually) then:
> >>>>> --datasource :someGroup=somewhere/something
> >>>>
> >>>> Mhmm, that would be unintended functionality from my side - current
> >>>> approach is that every "Document" / "Datasource / DataSource" is
> >>>> named
> >>>>
> >>>>>
> >>>>> A name must identify exactly 1 data source, while a group
> >>>>> identifies a
> >>>> list
> >>>>> of data sources.
> >>>>
> >>>> No, every "Document" / "Datasource / DataSource" has a name
> >>>> currently
> >> but
> >>>> uniqueness is not enforced. Only if you want to get a "Document" /
> >>>> "Datasource / DataSource" with it's exact name I checked for
> >>>> exactly one
> >>>> search hit and throw an exception. I try to provide a useful name
> >>>> even
> >> when
> >>>> the content is coming from an URL or STDIN (and I will probably add
> >>>> environment variables as "Document" / "Datasource / DataSource",
> >>>> e.g
> >>>> configuration in the cloud as JSON content passed as environment
> >> variable)
> >>>>
> >>>>>
> >>>>> Is that this idea, that the a data source can be part of a group,
> >>>>> and
> >>>> then
> >>>>> is also possibly identifiable with a name comes from an use case?
> >>>>> I
> >> mean,
> >>>>> it's possibly important somewhere, but if so, then it's strange
> >>>>> that
> >> you
> >>>>> can put something into only a single group. If we need this kind
> >>>>> of
> >>>> thing,
> >>>>> then perhaps you should be just allowed to associate the data
> >>>>> source
> >>>> with a
> >>>>> list of names (kind of like tagging), and then when the template
> >>>>> wants
> >> to
> >>>>> get something by name, it will tell there if it expects exactly
> >>>>> one or
> >> a
> >>>>> list of data sources. Then you don't need to introduce two terms
> >>>>> in the
> >>>>> documentation either (names and groups). Again, if we want this at
> >>>>> all,
> >>>>> instead of just going with a data source that itself gives a list.
> >>>>> (And
> >>>> if
> >>>>> not, how will we handle a data source that loads from a non-file
> >> source?)
> >>>>
> >>>> I actually thought of implementing tagging but considered a "group"
> >>>> sufficient.
> >>>>
> >>>> * If you don't define anything everything goes into the "default"
> >>>> group
> >>>> * For individual documents you can define a name and an optional
> >>>> group
> >>>>
> >>>> I think we have a different understanding what a "Document" /
> >> "Datasource
> >>>> / DataSource" should do
> >>>>
> >>>> * It is a dumb
> >>>> * It is lazy since data is only loaded on demand
> >>>> * There is no automagic like "oh, this is a JSON file, so let's go
> >>>> to
> >> the
> >>>> JSON tool and create a map readily accessible in the data model"
> >>>>
> >>>>>
> >>>>> Note that the current command line syntax doesn't work well with
> >>>>> shell
> >>>>> wildcard expansion. Like this:
> >>>>> --datasource :someGroup=logs/*.log
> >>>>> will try to expand ":someGroup=logs/*.log", and because it finds
> >> nothing
> >>>>> (and because the rules of sh and the like is a mess), you will get
> >>>>> the
> >>>>> parameter value as is, without * expanded.
> >>>>
> >>>> The joy of programming - I did not intend to use "name:group"
> >>>> together
> >>>> with wildcards :-)
> >>>>
> >>>>>
> >>>>> Also,  I think the syntax with colon should be flipped, because on
> >> other
> >>>>> places foo:bar usually means that foo is the bigger unit (the
> >> container),
> >>>>> and bar is the smaller unit (the child).
> >>>>
> >>>> I Disagree here - I think using a name would be used more often. I
> >>>> added
> >>>> the "group" as an afterthought since some grouping could be useful
> >>>>
> >>>>>
> >>>>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
> >>>>> [email protected]> wrote:
> >>>>>
> >>>>>> Hi Daniel,
> >>>>>>
> >>>>>> I'm an enterprise developer - bad habits die hard :-)
> >>>>>>
> >>>>>> So I closed the following tickets and merged the branches
> >>>>>>
> >>>>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli"
> >>>>>> into
> >>>>>> "freemarker-generator"
> >>>>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
> >>>> "Datasource"
> >>>>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied
> >> names
> >>>>>> for datasources
> >>>>>>
> >>>>>> Thanks in advance,
> >>>>>>
> >>>>>> Siegfried Goeschl
> >>>>>>
> >>>>>>
> >>>>>>> On 29.02.2020, at 12:19, Daniel Dekany <[email protected]>
> >>>> wrote:
> >>>>>>>
> >>>>>>> Yeah, and of course, you can merge that branch. You can even
> >>>>>>> work on
> >>>> the
> >>>>>>> master directly after all.
> >>>>>>>
> >>>>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
> >>>> [email protected]>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> But, I do recognize the cattle use case (several "faceless"
> >>>>>>>> files
> >> with
> >>>>>>>> common format/schema). Only, my idea is to push that complexity
> >>>>>>>> on
> >> the
> >>>>>> data
> >>>>>>>> source. The "data source" concept shields the rest of the
> >> application
> >>>>>> from
> >>>>>>>> the details of how the data is stored or retrieved. So, a data
> >> source
> >>>>>> might
> >>>>>>>> loads a bunch of log files from a directory, and present them
> >>>>>>>> as a
> >>>>>> single
> >>>>>>>> big table, or like a list of tables, etc. So I want to deal
> >>>>>>>> with the
> >>>>>> cattle
> >>>>>>>> use case, but the question is what part of the of architecture
> >>>>>>>> will
> >>>> deal
> >>>>>>>> with this complication, with other words, how do you box
> >>>>>>>> things. Why
> >>>> my
> >>>>>>>> initial bet is to stuff that complication into the "data
> >>>>>>>> source"
> >>>>>>>> implementation(s) is that data sources are inherently varied.
> >>>>>>>> Some
> >>>>>> returns
> >>>>>>>> a table-like thing, some have multiple named tables (worksheets
> >>>>>>>> in
> >>>>>> Excel),
> >>>>>>>> some returns tree of nodes (XML), etc. So then, some might
> >>>>>>>> returns a
> >>>>>>>> list-of-list-of log records, or just a single list of
> >>>>>>>> log-records
> >> (put
> >>>>>>>> together from daily log files). That way cattles don't add to
> >>>> conceptual
> >>>>>>>> complexity. Now, you might be aware of cases where the cattle
> >> concept
> >>>>>> must
> >>>>>>>> be more exposed than this, and the we can't box things like
> >>>>>>>> this.
> >> But
> >>>>>> this
> >>>>>>>> is what I tried to express.
> >>>>>>>>
> >>>>>>>> Regarding "output generators", and how that applies on the
> >>>>>>>> command
> >>>>>> line. I
> >>>>>>>> think it's important that the common core between Maven and
> >>>>>> command-line is
> >>>>>>>> as fat as possible. Ideally, they are just two syntax to set up
> >>>>>>>> the
> >>>> same
> >>>>>>>> thing. Mostly at least. So, if you specify a template file to
> >>>>>>>> the
> >> CLI
> >>>>>>>> application, in a way so that it causes it to process that
> >>>>>>>> template
> >> to
> >>>>>>>> generate a single output, then there you have just defined an
> >> "output
> >>>>>>>> generator" (even if it wasn't explicitly called like that in
> >>>>>>>> the
> >>>> command
> >>>>>>>> line). If you specify 3 csv files to the CLI application, in a
> >>>>>>>> way
> >> so
> >>>>>> that
> >>>>>>>> it causes it to generate 3 output files, then you have just
> >>>>>>>> defined
> >> 3
> >>>>>>>> "output generators" there (there's at least one template
> >>>>>>>> specified
> >>>> there
> >>>>>>>> too, but that wasn't an "output generator" itself, it was just
> >>>>>>>> an
> >>>>>> attribute
> >>>>>>>> of the 3 output generators). If you specify 1 template, and 3
> >>>>>>>> csv
> >>>>>> files, in
> >>>>>>>> a way so that it will yield 4 output files (1 for the template,
> >>>>>>>> 3
> >> for
> >>>>>> the
> >>>>>>>> csv-s), then you have defined 4 output generators there. If you
> >> have a
> >>>>>> data
> >>>>>>>> source that loads a list of 3 entities (say, 3 csv files, so
> >>>>>>>> it's a
> >>>>>> list of
> >>>>>>>> tables then), and you have 2 templates, and you tell the CLI to
> >>>> execute
> >>>>>>>> each template for each item in said data source, then you have
> >>>>>>>> just
> >>>>>> defined
> >>>>>>>> 6 "output generators".
> >>>>>>>>
> >>>>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
> >>>>>>>> [email protected]> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Daniel,
> >>>>>>>>>
> >>>>>>>>> That all depends on your mental model and work you do,
> >> expectations,
> >>>>>>>>> experience :-)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> __Document Handling__
> >>>>>>>>>
> >>>>>>>>> *"But I think actually we have no good use case for list of
> >> documents
> >>>>>>>>> that's passed at once to a single template run, so, we can
> >>>>>>>>> just
> >>>> ignore
> >>>>>>>>> that complication"*
> >>>>>>>>>
> >>>>>>>>> In my case that's not a complication but my daily business -
> >>>>>>>>> I'm
> >>>>>>>>> regularly wading through access logs - yesterday probably a
> >>>>>>>>> couple
> >> of
> >>>>>>>>> hundreds access logs across two staging sites to help tracking
> >>>>>>>>> some
> >>>>>>>>> strange API gateway issues :-)
> >>>>>>>>>
> >>>>>>>>> My gut feeling is (borrowing from
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>
> >>
> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
> >>>>>>>>> )
> >>>>>>>>>
> >>>>>>>>> 1. You have a few lovely named documents / templates - `pets`
> >>>>>>>>> 2. You have tons of anonymous documents / templates to process
> >>>>>>>>> -
> >>>>>>>>> `cattle`
> >>>>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
> >>>>>>>>>
> >>>>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover
> >>>>>>>>> 1)
> >>>> since
> >>>>>>>>> it is equally important and common.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> __Template And Document Processing Modes__
> >>>>>>>>>
> >>>>>>>>> IMHO it is important to answer the following question : "How
> >>>>>>>>> many
> >>>>>>>>> outputs do you get when rendering 2 template and 3
> >>>>>>>>> datasources?
> >> Two,
> >>>>>>>>> Three or Six?"
> >>>>>>>>>
> >>>>>>>>> Your answer is influenced by your mental model / experience
> >>>>>>>>>
> >>>>>>>>> * When wading through tons of CSV files, access logs, etc. the
> >> answer
> >>>>>> is
> >>>>>>>>> "2"
> >>>>>>>>> * When doing source code generation the obvious answer is "6"
> >>>>>>>>> * Can't image a use case which results in "3" but I'm pretty
> >>>>>>>>> sure
> >> we
> >>>>>>>>> will encounter one
> >>>>>>>>>
> >>>>>>>>> __Template and document mode probably shouldn't exist__
> >>>>>>>>>
> >>>>>>>>> That's hard for me to fully understand - I definitely lack
> >>>>>>>>> your
> >>>>>> insights
> >>>>>>>>> & experience writing such tools :-)
> >>>>>>>>>
> >>>>>>>>> Defining the `Output Generator` is the underlying model for
> >>>>>>>>> the
> >> Maven
> >>>>>>>>> plugin (and probably FMPP).
> >>>>>>>>>
> >>>>>>>>> I'm not sure if this applies for command lines at least not in
> >>>>>>>>> the
> >>>> way
> >>>>>> I
> >>>>>>>>> use them (or would like to use them)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks in advance,
> >>>>>>>>>
> >>>>>>>>> Siegfried Goeschl
> >>>>>>>>>
> >>>>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
> >>>>>>>>>
> >>>>>>>>>> Yeah, "data source" is surely a too popular name, but for
> >>>>>>>>>> reason.
> >>>>>>>>>> Anyone
> >>>>>>>>>> has other ideas?
> >>>>>>>>>>
> >>>>>>>>>> As of naming data sources and such. One thing I was wondering
> >> about
> >>>>>>>>>> back
> >>>>>>>>>> then is how to deal with list of documents given to a
> >>>>>>>>>> template,
> >>>> versus
> >>>>>>>>>> exactly 1 document given to a template. But I think actually
> >>>>>>>>>> we
> >> have
> >>>>>>>>>> no
> >>>>>>>>>> good use case for list of documents that's passed at once to
> >>>>>>>>>> a
> >>>> single
> >>>>>>>>>> template run, so, we can just ignore that complication. A
> >>>>>>>>>> document
> >>>> has
> >>>>>>>>>> a
> >>>>>>>>>> name, and that's always just a single document, not a
> >>>>>>>>>> collection,
> >> as
> >>>>>>>>>> far as
> >>>>>>>>>> the template is concerned. (We can have multiple documents
> >>>>>>>>>> per
> >> run,
> >>>>>>>>>> but
> >>>>>>>>>> those normally yield separate output generators, so it's
> >>>>>>>>>> still
> >> only
> >>>>>>>>>> one
> >>>>>>>>>> document per template.) However, we can have data source
> >>>>>>>>>> types
> >>>>>>>>>> (document
> >>>>>>>>>> types with old terminology) that collect together multiple
> >>>>>>>>>> data
> >>>> files.
> >>>>>>>>>> So
> >>>>>>>>>> then that complexity is encapsulated into the data source
> >>>>>>>>>> type,
> >> and
> >>>>>>>>>> doesn't
> >>>>>>>>>> complicate the overall architecture. That's another case when
> >>>>>>>>>> a
> >> data
> >>>>>>>>>> source
> >>>>>>>>>> is not just a file. Like maybe there's a data source type
> >>>>>>>>>> that
> >> loads
> >>>>>>>>>> all
> >>>>>>>>>> the CSV-s from a directory, into a single big table (I had
> >>>>>>>>>> such
> >>>> case),
> >>>>>>>>>> or
> >>>>>>>>>> even into a list of tables. Or, as I mentioned already, a
> >>>>>>>>>> data
> >>>> source
> >>>>>>>>>> is
> >>>>>>>>>> maybe an SQL query on a JDBC data source (and we got the
> >>>>>>>>>> first
> >> term
> >>>>>>>>>> clash... JDBC also call them data sources).
> >>>>>>>>>>
> >>>>>>>>>> Template and document mode probably shouldn't exist from user
> >>>>>>>>>> perspective
> >>>>>>>>>> either, at least not as a global option that must apply to
> >>>> everything
> >>>>>>>>>> in a
> >>>>>>>>>> run. They could just give the files that define the "output
> >>>>>>>>>> generators",
> >>>>>>>>>> and some of them will be templates, some of them are data
> >>>>>>>>>> files,
> >> in
> >>>>>>>>>> which
> >>>>>>>>>> case a template need to be associated with them (and there
> >>>>>>>>>> can be
> >> a
> >>>>>>>>>> couple
> >>>>>>>>>> of ways of doing that). And then again, there are the cases
> >>>>>>>>>> where
> >>>> you
> >>>>>>>>>> want
> >>>>>>>>>> to create one output generator per entity from some data
> >>>>>>>>>> source.
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> >>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Daniel,
> >>>>>>>>>>>
> >>>>>>>>>>> See my comments below - and thanks for your patience and
> >>>>>>>>>>> input
> >> :-)
> >>>>>>>>>>>
> >>>>>>>>>>> *Renaming Document To DataSource*
> >>>>>>>>>>>
> >>>>>>>>>>> Yes, makes sense. I tried to avoid since I'm using
> >> javax.activation
> >>>>>>>>>>> and
> >>>>>>>>>>> its DataSource.
> >>>>>>>>>>>
> >>>>>>>>>>> *Template And Document Mode*
> >>>>>>>>>>>
> >>>>>>>>>>> Agreed - I think it is a valuable abstraction for the user
> >>>>>>>>>>> but it
> >>>> is
> >>>>>>>>>>> not
> >>>>>>>>>>> an implementation concept :-)
> >>>>>>>>>>>
> >>>>>>>>>>> *Document Without Symbolic Names*
> >>>>>>>>>>>
> >>>>>>>>>>> Also agreed and it is going to change but I have not settled
> >>>>>>>>>>> my
> >>>> mind
> >>>>>>>>>>> yet
> >>>>>>>>>>> what exactly to implement.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks in advance,
> >>>>>>>>>>>
> >>>>>>>>>>> Siegfried Goeschl
> >>>>>>>>>>>
> >>>>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> A few quick thoughts on that:
> >>>>>>>>>>>
> >>>>>>>>>>> - We should replace the "document" term with something more
> >>>> speaking.
> >>>>>>>>>>> It
> >>>>>>>>>>> doesn't tell that it's some kind of input. Also, most of
> >>>>>>>>>>> these
> >>>> inputs
> >>>>>>>>>>> aren't something that people typically call documents. Like
> >>>>>>>>>>> a csv
> >>>>>>>>>>> file, or
> >>>>>>>>>>> a database table, which is not even a file (OK we don't
> >>>>>>>>>>> support
> >>>> such
> >>>>>>>>>>> thing
> >>>>>>>>>>> at the moment). I think, maybe "data source" is a safe
> >>>>>>>>>>> enough
> >> term.
> >>>>>>>>>>> (It
> >>>>>>>>>>> also rhymes with data model.)
> >>>>>>>>>>> - You have separate "template" and "document" "mode", that
> >> applies
> >>>> to
> >>>>>>>>>>> a
> >>>>>>>>>>> whole run. I think such specialization won't be helpful. We
> >>>>>>>>>>> could
> >>>>>>>>>>> just say,
> >>>>>>>>>>> on the conceptual level at lest, that we need a set of
> >>>>>>>>>>> "outputs
> >>>>>>>>>>> generators". An output generator is an object (in the API)
> >>>>>>>>>>> that
> >>>>>>>>>>> specifies a
> >>>>>>>>>>> template, a data-model (where the data-model is possibly
> >> populated
> >>>>>>>>>>> with
> >>>>>>>>>>> "documents"), and an output "sink" (a file path, or stdout),
> >>>>>>>>>>> and
> >>>> can
> >>>>>>>>>>> generate the output itself. A practical way of defining the
> >> output
> >>>>>>>>>>> generators in a CLI application is via a bunch of files,
> >>>>>>>>>>> each
> >>>>>>>>>>> defining an
> >>>>>>>>>>> output generator. Some of those files is maybe a template
> >>>>>>>>>>> (that
> >> you
> >>>>>>>>>>> can
> >>>>>>>>>>> even detect from the file extension), or a data file that we
> >>>>>>>>>>> currently call
> >>>>>>>>>>> a "document". They could freely mix inside the same run. I
> >>>>>>>>>>> have
> >>>> also
> >>>>>>>>>>> met
> >>>>>>>>>>> use case when you have a single table (single "document"),
> >>>>>>>>>>> and
> >> each
> >>>>>>>>>>> record
> >>>>>>>>>>> in it yields an output file. That can also be described in
> >>>>>>>>>>> some
> >>>> file
> >>>>>>>>>>> format, or really in any other way, like directly in command
> >>>>>>>>>>> line
> >>>>>>>>>>> argument,
> >>>>>>>>>>> via API, etc.
> >>>>>>>>>>> - You have multiple documents without associated symbolical
> >>>>>>>>>>> name
> >> in
> >>>>>>>>>>> some
> >>>>>>>>>>> examples. Templates can't identify those then in a well
> >>>> maintainable
> >>>>>>>>>>> way.
> >>>>>>>>>>> The actual file name is often not a good identifier, can
> >>>>>>>>>>> change
> >>>> over
> >>>>>>>>>>> time,
> >>>>>>>>>>> and you might don't even have good control over it, like you
> >>>> already
> >>>>>>>>>>> receive it as a parameter from somewhere else, or someone
> >>>>>>>>>>> moves/renames
> >>>>>>>>>>> that files that you need to read. Index is also not very
> >>>>>>>>>>> good,
> >> but
> >>>> I
> >>>>>>>>>>> have
> >>>>>>>>>>> written about that earlier.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> >>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi folks,
> >>>>>>>>>>>
> >>>>>>>>>>> still wrapping my side around but assembled some thoughts
> >>>>>>>>>>> here -
> >>>>>>>>>>>
> >> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks in advance,
> >>>>>>>>>>>
> >>>>>>>>>>> Siegfried Goeschl
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]>
> >>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> What you are describing is more like the angle that FMPP
> >>>>>>>>>>> took
> >>>>>>>>>>> initially,
> >>>>>>>>>>> where templates drive things, they generate the output for
> >>>> themselves
> >>>>>>>>>>>
> >>>>>>>>>>> (even
> >>>>>>>>>>>
> >>>>>>>>>>> multiple output files if they wish). By default output files
> >>>>>>>>>>> name
> >>>>>>>>>>> (and
> >>>>>>>>>>> relative path) is deduced from template name. There was also
> >>>>>>>>>>> a
> >>>> global
> >>>>>>>>>>> data-model, built in a configuration file (or equally, built
> >>>>>>>>>>> via
> >>>>>>>>>>> command
> >>>>>>>>>>> line arguments, or both mixed), from which templates get
> >>>>>>>>>>> whatever
> >>>>>>>>>>> data
> >>>>>>>>>>>
> >>>>>>>>>>> they
> >>>>>>>>>>>
> >>>>>>>>>>> are interested in. Take a look at the figures here:
> >>>>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept
> >>>>>>>>>>> was
> >>>>>>>>>>>
> >>>>>>>>>>> generalized
> >>>>>>>>>>>
> >>>>>>>>>>> a bit more, because you could add XML files at the same
> >>>>>>>>>>> place
> >> where
> >>>>>>>>>>> you
> >>>>>>>>>>> have the templates, and then you could associate transform
> >>>> templates
> >>>>>>>>>>> to
> >>>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> XML files (based on path pattern and/or the XML document
> >> element).
> >>>>>>>>>>> Now
> >>>>>>>>>>> that's like what freemarker-generator had initially (data
> >>>>>>>>>>> files
> >>>> drive
> >>>>>>>>>>> output, and the template is there to transform it).
> >>>>>>>>>>>
> >>>>>>>>>>> So I think the generic mental model would like this:
> >>>>>>>>>>>
> >>>>>>>>>>> 1. You got files that drive the process, let's call them
> >> *generator
> >>>>>>>>>>> files* for now. Usually, each generator file yields an
> >>>>>>>>>>> output
> >> file
> >>>>>>>>>>> (but
> >>>>>>>>>>> maybe even multiple output files, as you might saw in the
> >>>>>>>>>>> last
> >>>>>>>>>>> figure).
> >>>>>>>>>>> These generator files can be of many types, like XML, JSON,
> >>>>>>>>>>> XLSX
> >>>> (as
> >>>>>>>>>>>
> >>>>>>>>>>> in the
> >>>>>>>>>>>
> >>>>>>>>>>> original freemarker-generator), and even templates (as is
> >>>>>>>>>>> the
> >> norm
> >>>> in
> >>>>>>>>>>> FMPP). If the file is not a template, then you got a set of
> >>>>>>>>>>> transformer
> >>>>>>>>>>> templates (-t CLI option) in a separate directory, which can
> >>>>>>>>>>> be
> >>>>>>>>>>>
> >>>>>>>>>>> associated
> >>>>>>>>>>>
> >>>>>>>>>>> with the generator files base on name patterns, and even
> >>>>>>>>>>> based on
> >>>>>>>>>>>
> >>>>>>>>>>> content
> >>>>>>>>>>>
> >>>>>>>>>>> (schema usually). If the generator file is a template (so
> >>>>>>>>>>> that's
> >> a
> >>>>>>>>>>> positional @Parameter CLI argument that happens to be an
> >>>>>>>>>>> *.ftl,
> >> and
> >>>>>>>>>>> is
> >>>>>>>>>>>
> >>>>>>>>>>> not
> >>>>>>>>>>>
> >>>>>>>>>>> a template file specified after the "-t" option), then you
> >>>>>>>>>>> just
> >>>>>>>>>>> Template.process(...) it, and it prints what the output will
> >>>>>>>>>>> be.
> >>>>>>>>>>> 2. You also have a set of variables, the global data-model,
> >>>>>>>>>>> that
> >>>>>>>>>>> contains commonly useful stuff, like what you now call
> >>>>>>>>>>> parameters
> >>>>>>>>>>> (CLI
> >>>>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML,
> >>>>>>>>>>> etc..
> >>>> Those
> >>>>>>>>>>>
> >>>>>>>>>>> data
> >>>>>>>>>>>
> >>>>>>>>>>> files aren't "generator files". Templates just use them if
> >>>>>>>>>>> they
> >>>> need
> >>>>>>>>>>>
> >>>>>>>>>>> them.
> >>>>>>>>>>>
> >>>>>>>>>>> An important thing here is to reuse the same mechanism to
> >>>>>>>>>>> read
> >> and
> >>>>>>>>>>>
> >>>>>>>>>>> parse
> >>>>>>>>>>>
> >>>>>>>>>>> those data files, which was used in templates when
> >>>>>>>>>>> transforming
> >>>>>>>>>>>
> >>>>>>>>>>> generator
> >>>>>>>>>>>
> >>>>>>>>>>> files. So we need a common format for specifying how to load
> >>>>>>>>>>> data
> >>>>>>>>>>>
> >>>>>>>>>>> files.
> >>>>>>>>>>>
> >>>>>>>>>>> That's maybe just FTL that #assigns to the variables, or
> >>>>>>>>>>> maybe
> >> more
> >>>>>>>>>>> declarative format.
> >>>>>>>>>>>
> >>>>>>>>>>> What I have described in the original post here was a less
> >> generic
> >>>>>>>>>>> form
> >>>>>>>>>>>
> >>>>>>>>>>> of
> >>>>>>>>>>>
> >>>>>>>>>>> this, as I tried to be true with the original approach. I
> >>>>>>>>>>> though
> >>>> the
> >>>>>>>>>>> proposal will be drastic enough as it is... :) There, the
> >>>>>>>>>>> "main"
> >>>>>>>>>>> document
> >>>>>>>>>>> is the "generator file" from point 1, the "-t" template is
> >>>>>>>>>>> the
> >>>>>>>>>>> transform
> >>>>>>>>>>> template for the "main" document, and the other named
> >>>>>>>>>>> documents
> >>>>>>>>>>> ("users",
> >>>>>>>>>>> "groups") is a poor man's shared data-model from point 2
> >> (together
> >>>>>>>>>>> with
> >>>>>>>>>>> with -PName=value).
> >>>>>>>>>>>
> >>>>>>>>>>> There's further somewhat confusing thing to get right with
> >>>>>>>>>>> the
> >>>>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`)
> >>>>>>>>>>> thing
> >>>> though.
> >>>>>>>>>>> In
> >>>>>>>>>>> the model above, as per point 1, if you list multiple data
> >>>>>>>>>>> files,
> >>>>>>>>>>> each
> >>>>>>>>>>>
> >>>>>>>>>>> will
> >>>>>>>>>>>
> >>>>>>>>>>> generate a separate output file. So, if you need take in a
> >>>>>>>>>>> list
> >> of
> >>>>>>>>>>> files
> >>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>>>
> >>>>>>>>>>> transform it to a single output file (or at least with a
> >>>>>>>>>>> single
> >>>>>>>>>>> transform
> >>>>>>>>>>> template execution), then you have to be explicit about
> >>>>>>>>>>> that, as
> >>>>>>>>>>> that's
> >>>>>>>>>>>
> >>>>>>>>>>> not
> >>>>>>>>>>>
> >>>>>>>>>>> the default behavior anymore. But it's still absolutely
> >>>>>>>>>>> possible.
> >>>>>>>>>>> Imagine
> >>>>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You
> >>>>>>>>>>> need
> >>>> some
> >>>>>>>>>>> CLI
> >>>>>>>>>>> (and Maven config, etc.) syntax to express that, but that
> >> shouldn't
> >>>>>>>>>>> be a
> >>>>>>>>>>> big deal.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> >>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Daniel,
> >>>>>>>>>>>
> >>>>>>>>>>> Good timing - I was looking at a similar problem from
> >>>>>>>>>>> different
> >>>> angle
> >>>>>>>>>>> yesterday (see below)
> >>>>>>>>>>>
> >>>>>>>>>>> Don't have enough time to answer your email in detail now -
> >>>>>>>>>>> will
> >> do
> >>>>>>>>>>> that
> >>>>>>>>>>> tomorrow evening
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks in advance,
> >>>>>>>>>>>
> >>>>>>>>>>> Siegfried Goeschl
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ===. START
> >>>>>>>>>>> # FreeMarker CLI Improvement
> >>>>>>>>>>> ## Support Of Multiple Template Files
> >>>>>>>>>>> Currently we support the following combinations
> >>>>>>>>>>>
> >>>>>>>>>>> * Single template and no data files
> >>>>>>>>>>> * Single template and one or more data files
> >>>>>>>>>>>
> >>>>>>>>>>> But we can not support the following use case which is quite
> >>>> typical
> >>>>>>>>>>> in
> >>>>>>>>>>> the cloud
> >>>>>>>>>>>
> >>>>>>>>>>> __Convert multiple templates with a single data file, e.g
> >> copying a
> >>>>>>>>>>> directory of configuration files using a JSON configuration
> >> file__
> >>>>>>>>>>>
> >>>>>>>>>>> ## Implementation notes
> >>>>>>>>>>> * When we copy a directory we can remove the `ftl`extension
> >>>>>>>>>>> on
> >> the
> >>>>>>>>>>> fly
> >>>>>>>>>>> * We might need an `exclude` filter for the copy operation
> >>>>>>>>>>> * Initially resolve to a list of template files and process
> >>>>>>>>>>> one
> >>>> after
> >>>>>>>>>>> another
> >>>>>>>>>>> * Need to calculate the output file location and extension
> >>>>>>>>>>> * We need to rename the existing command line parameters
> >>>>>>>>>>> (see
> >>>> below)
> >>>>>>>>>>> * Do we need multiple include and exclude filter?
> >>>>>>>>>>> * Do we need file versus directory filters?
> >>>>>>>>>>>
> >>>>>>>>>>> ### Command Line Options
> >>>>>>>>>>> ```
> >>>>>>>>>>> --input-encoding : Encoding of the documents
> >>>>>>>>>>> --output-encoding : Encoding of the rendered template
> >>>>>>>>>>> --template-encoding : Encoding of the template
> >>>>>>>>>>> --output : Output file or directory
> >>>>>>>>>>> --include-document : Include pattern for documents
> >>>>>>>>>>> --exclude-document : Exclude pattern for documents
> >>>>>>>>>>> --include-template: Include pattern for templates
> >>>>>>>>>>> --exclude-template : Exclude pattern for templates
> >>>>>>>>>>> ```
> >>>>>>>>>>>
> >>>>>>>>>>> ### Command Line Examples
> >>>>>>>>>>> ```text
> >>>>>>>>>>> # Copy all FTL templates found in "ext/config" to the
> >>>>>>>>>>> "/config"
> >>>>>>>>>>>
> >>>>>>>>>>> directory
> >>>>>>>>>>>
> >>>>>>>>>>> using the data from "config.json"
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o
> >> /config
> >>>>>>>>>>>
> >>>>>>>>>>> config.json
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
> >>>>>>>>>>> *.ftl
> >>>>>>>>>>>
> >>>>>>>>>>> --output
> >>>>>>>>>>>
> >>>>>>>>>>> /config config.json
> >>>>>>>>>>>
> >>>>>>>>>>> # Bascically the same using a named document "configuration"
> >>>>>>>>>>> # It might make sense to expose "conf" directly in the
> >>>>>>>>>>> FreeMarker
> >>>>>>>>>>> data
> >>>>>>>>>>> model
> >>>>>>>>>>> # It might make sens to allow URIs for loading documents
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >>>>>>>>>>>
> >>>>>>>>>>> configuration=config.json
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
> >>>>>>>>>>> *.ftl
> >>>>>>>>>>>
> >>>>>>>>>>> --output
> >>>>>>>>>>>
> >>>>>>>>>>> /config --document configuration=config.json
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
> >>>>>>>>>>> *.ftl
> >>>>>>>>>>>
> >>>>>>>>>>> --output
> >>>>>>>>>>>
> >>>>>>>>>>> /config --document configuration=file:///config.json
> >>>>>>>>>>>
> >>>>>>>>>>> # Bascically the same using an environment variable as named
> >>>> document
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o
> >> /config
> >>>> -d
> >>>>>>>>>>>
> >>>>>>>>>>> configuration=env:///CONFIGURATION
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
> >>>>>>>>>>> *.ftl
> >>>>>>>>>>>
> >>>>>>>>>>> --output
> >>>>>>>>>>>
> >>>>>>>>>>> /config --document configuration=env:///CONFIGURATION
> >>>>>>>>>>> ```
> >>>>>>>>>>> === END
> >>>>>>>>>>>
> >>>>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]>
> >> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Input documents is a fundamental concept in
> >>>>>>>>>>> freemarker-generator,
> >>>> so
> >>>>>>>>>>> we
> >>>>>>>>>>> should think about that more, and probably refine/rework how
> >>>>>>>>>>> it's
> >>>>>>>>>>> done.
> >>>>>>>>>>>
> >>>>>>>>>>> Currently it works like this, with CLI at least.
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> -t access-report.ftl
> >>>>>>>>>>> somewhere/foo-access-log.csv
> >>>>>>>>>>>
> >>>>>>>>>>> Then in access-report.ftl you have to do something like
> >>>>>>>>>>> this:
> >>>>>>>>>>>
> >>>>>>>>>>> <#assign doc = Documents.get(0)>
> >>>>>>>>>>> ... process doc here
> >>>>>>>>>>>
> >>>>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that
> >>>>>>>>>>> lead
> >>>> to a
> >>>>>>>>>>>
> >>>>>>>>>>> funny
> >>>>>>>>>>>
> >>>>>>>>>>> chain of coincidences: It returned the string "D", then
> >>>>>>>>>>>
> >>>>>>>>>>> CSVTool.parse(...)
> >>>>>>>>>>>
> >>>>>>>>>>> happily parsed that to a table with the single column "D",
> >>>>>>>>>>> and 0
> >>>>>>>>>>> rows,
> >>>>>>>>>>>
> >>>>>>>>>>> and
> >>>>>>>>>>>
> >>>>>>>>>>> as there were 0 rows, the template didn't run into an error
> >> because
> >>>>>>>>>>> row.myExpectedColumn refers to a missing column either, so
> >>>>>>>>>>> the
> >>>>>>>>>>> process
> >>>>>>>>>>> finished with success. (: Pretty unlucky for sure. The root
> >>>>>>>>>>> was
> >>>>>>>>>>> unintentionally breaking a FreeMarker idiom though;
> >>>>>>>>>>> eventually we
> >>>>>>>>>>> will
> >>>>>>>>>>>
> >>>>>>>>>>> have
> >>>>>>>>>>>
> >>>>>>>>>>> to work on those too, but, different topic.)
> >>>>>>>>>>>
> >>>>>>>>>>> However, actually multiple input documents can be passed in:
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> -t access-report.ftl
> >>>>>>>>>>> somewhere/foo-access-log.csv
> >>>>>>>>>>> somewhere/bar-access-log.csv
> >>>>>>>>>>>
> >>>>>>>>>>> Above template will still work, though then you ignored all
> >>>>>>>>>>> but
> >> the
> >>>>>>>>>>>
> >>>>>>>>>>> first
> >>>>>>>>>>>
> >>>>>>>>>>> document. So if you expect any number of input documents,
> >>>>>>>>>>> you
> >>>>>>>>>>> probably
> >>>>>>>>>>>
> >>>>>>>>>>> will
> >>>>>>>>>>>
> >>>>>>>>>>> have to do this:
> >>>>>>>>>>>
> >>>>>>>>>>> <#list Documents.list as doc>
> >>>>>>>>>>> ... process doc here
> >>>>>>>>>>> </#list>
> >>>>>>>>>>>
> >>>>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but
> >> again,
> >>>>>>>>>>>
> >>>>>>>>>>> those
> >>>>>>>>>>>
> >>>>>>>>>>> we will work out in a different thread.)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> So, what would be better, in my opinion. I start out from
> >>>>>>>>>>> what I
> >>>>>>>>>>> think
> >>>>>>>>>>>
> >>>>>>>>>>> are
> >>>>>>>>>>>
> >>>>>>>>>>> the common uses cases, in decreasing order of frequency.
> >>>>>>>>>>> Goal is
> >> to
> >>>>>>>>>>>
> >>>>>>>>>>> make
> >>>>>>>>>>>
> >>>>>>>>>>> those less error prone for the users, and simpler to
> >>>>>>>>>>> express.
> >>>>>>>>>>>
> >>>>>>>>>>> USE CASE 1
> >>>>>>>>>>>
> >>>>>>>>>>> You have exactly 1 input documents, which is therefore
> >>>>>>>>>>> simply
> >> "the"
> >>>>>>>>>>> document in the mind of the user. This is probably the
> >>>>>>>>>>> typical
> >> use
> >>>>>>>>>>>
> >>>>>>>>>>> case,
> >>>>>>>>>>>
> >>>>>>>>>>> but at least the use case users typically start out from
> >>>>>>>>>>> when
> >>>>>>>>>>> starting
> >>>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> work.
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> -t access-report.ftl
> >>>>>>>>>>> somewhere/foo-access-log.csv
> >>>>>>>>>>>
> >>>>>>>>>>> Then `Documents.get(0)` is not very fitting. Most
> >>>>>>>>>>> importantly
> >> it's
> >>>>>>>>>>>
> >>>>>>>>>>> error
> >>>>>>>>>>>
> >>>>>>>>>>> prone, because if the user passed in more than 1 documents
> >>>>>>>>>>> (can
> >>>> even
> >>>>>>>>>>>
> >>>>>>>>>>> happen
> >>>>>>>>>>>
> >>>>>>>>>>> totally accidentally, like if the user was lazy and used a
> >> wildcard
> >>>>>>>>>>>
> >>>>>>>>>>> that
> >>>>>>>>>>>
> >>>>>>>>>>> the shell exploded), the template will silently ignore the
> >>>>>>>>>>> rest
> >> of
> >>>>>>>>>>> the
> >>>>>>>>>>> documents, and the singe document processed will be
> >>>>>>>>>>> practically
> >>>>>>>>>>> picked
> >>>>>>>>>>> randomly. The user might won't notice that and submits a bad
> >> report
> >>>>>>>>>>> or
> >>>>>>>>>>>
> >>>>>>>>>>> such.
> >>>>>>>>>>>
> >>>>>>>>>>> I think that in this use case the document should be simply
> >>>> referred
> >>>>>>>>>>> as
> >>>>>>>>>>> `Document` in the template. When you have multiple documents
> >> there,
> >>>>>>>>>>> referring to `Document` should be an error, saying that the
> >>>> template
> >>>>>>>>>>>
> >>>>>>>>>>> was
> >>>>>>>>>>>
> >>>>>>>>>>> made to process a single document only.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> USE CASE 2
> >>>>>>>>>>>
> >>>>>>>>>>> You have multiple input documents, but each has different
> >>>>>>>>>>> role
> >>>>>>>>>>>
> >>>>>>>>>>> (different
> >>>>>>>>>>>
> >>>>>>>>>>> schema, maybe different file type). Like, you pass in
> >>>>>>>>>>> users.csv
> >> and
> >>>>>>>>>>> groups.csv. Each has difference schema, and so you want to
> >>>>>>>>>>> access
> >>>>>>>>>>> them
> >>>>>>>>>>> differently, but in the same template.
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> [...]
> >>>>>>>>>>> --named-document users somewhere/foo-users.csv
> >>>>>>>>>>> --named-document groups somewhere/foo-groups.csv
> >>>>>>>>>>>
> >>>>>>>>>>> Then in the template you could refer to them as:
> >>>>>>>>>>>
> >>>>>>>>>>> `NamedDocuments.users`,
> >>>>>>>>>>>
> >>>>>>>>>>> and `NamedDocuments.groups`.
> >>>>>>>>>>>
> >>>>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept,
> >>>>>>>>>>> where
> >>>>>>>>>>>
> >>>>>>>>>>> `Document`
> >>>>>>>>>>>
> >>>>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called
> >>>>>>>>>>> "main"
> >>>>>>>>>>>
> >>>>>>>>>>> because
> >>>>>>>>>>>
> >>>>>>>>>>> that's "the" document the template is about, but then you
> >>>>>>>>>>> have to
> >>>>>>>>>>> added
> >>>>>>>>>>> some helper documents, with symbolic names representing
> >>>>>>>>>>> their
> >> role.
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> -t access-report.ftl
> >>>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>>>>>> --document-name=groups somewhere/foo-groups.csv
> >>>>>>>>>>>
> >>>>>>>>>>> Here, `Document` still works in the template, and it refers
> >>>>>>>>>>> to
> >>>>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting
> >>>> --document-name=main
> >>>>>>>>>>>
> >>>>>>>>>>> above
> >>>>>>>>>>>
> >>>>>>>>>>> would be cleaner, I couldn't figure out how to do that with
> >>>> Picocli.
> >>>>>>>>>>> Anyway, for now the point is the concept, which is not
> >>>>>>>>>>> specific
> >> to
> >>>>>>>>>>>
> >>>>>>>>>>> CLI.)
> >>>>>>>>>>>
> >>>>>>>>>>> USE CASE 3
> >>>>>>>>>>>
> >>>>>>>>>>> Here you have several of the same kind of documents. That
> >>>>>>>>>>> has a
> >>>> more
> >>>>>>>>>>> generic sub-use-case, when you have explicitly named
> >>>>>>>>>>> documents
> >>>> (like
> >>>>>>>>>>> "users" above), and for some you expect multiple input
> >>>>>>>>>>> files.
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> -t access-report.ftl
> >>>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>>>>>> somewhere/bar-access-log.csv
> >>>>>>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>>>>>> somewhere/bar-users.csv
> >>>>>>>>>>> --document-name=groups somewhere/global-groups.csv
> >>>>>>>>>>>
> >>>>>>>>>>> The template must to be written with this use case in mind,
> >>>>>>>>>>> as
> >> now
> >>>> it
> >>>>>>>>>>>
> >>>>>>>>>>> has
> >>>>>>>>>>>
> >>>>>>>>>>> #list some of the documents. (I think in practice you hardly
> >>>>>>>>>>> ever
> >>>>>>>>>>> want
> >>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>>>
> >>>>>>>>>>> get a document by hard coded index. Either you don't know
> >>>>>>>>>>> how
> >> many
> >>>>>>>>>>> documents you have, so you can't use hard coded indexes, or
> >>>>>>>>>>> you
> >> do,
> >>>>>>>>>>> and
> >>>>>>>>>>> each index has a specific meaning, but then you should name
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> documents
> >>>>>>>>>>>
> >>>>>>>>>>> instead, as using indexes is error prone, and hard to read.)
> >>>>>>>>>>> Accessing that list of documents in the template, maybe
> >>>>>>>>>>> could be
> >>>> done
> >>>>>>>>>>>
> >>>>>>>>>>> like
> >>>>>>>>>>>
> >>>>>>>>>>> this:
> >>>>>>>>>>> - For the "main" documents: `DocumentList`
> >>>>>>>>>>> - For explicitly named documents, like "users":
> >>>>>>>>>>>
> >>>>>>>>>>> `NamedDocumentLists.users`
> >>>>>>>>>>>
> >>>>>>>>>>> SUMMING UP
> >>>>>>>>>>>
> >>>>>>>>>>> To unify all 3 use cases into a coherent concept:
> >>>>>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and
> >>>>>>>>>>> while
> >>>> you
> >>>>>>>>>>>
> >>>>>>>>>>> can
> >>>>>>>>>>>
> >>>>>>>>>>> achieve everything with it, using it requires your template
> >>>>>>>>>>> to
> >>>> handle
> >>>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> most generic case too. So, I think it would be rarely used.
> >>>>>>>>>>> - `DocumentList` is just a shorthand for
> >> `NamedDocumentLists.main`.
> >>>>>>>>>>>
> >>>>>>>>>>> It's
> >>>>>>>>>>>
> >>>>>>>>>>> used if you only have one kind of documents (single format
> >>>>>>>>>>> and
> >>>>>>>>>>> schema),
> >>>>>>>>>>>
> >>>>>>>>>>> but
> >>>>>>>>>>>
> >>>>>>>>>>> potentially multiple of them.
> >>>>>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly
> >>>>>>>>>>> 1
> >>>>>>>>>>> document
> >>>>>>>>>>>
> >>>>>>>>>>> of
> >>>>>>>>>>>
> >>>>>>>>>>> the given name.
> >>>>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`.
> >>>>>>>>>>> This
> >> is
> >>>>>>>>>>> for
> >>>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> most natural/frequent use case.
> >>>>>>>>>>>
> >>>>>>>>>>> That's 4 possible ways of accessing your documents, which is
> >>>>>>>>>>> a
> >>>>>>>>>>>
> >>>>>>>>>>> trade-off
> >>>>>>>>>>>
> >>>>>>>>>>> for the sake of these:
> >>>>>>>>>>> - Catching CLI (or Maven, etc.) input where the template
> >>>>>>>>>>> output
> >>>>>>>>>>> likely
> >>>>>>>>>>>
> >>>>>>>>>>> will
> >>>>>>>>>>>
> >>>>>>>>>>> be wrong. That's only possible if the user can communicate
> >>>>>>>>>>> its
> >>>> intent
> >>>>>>>>>>>
> >>>>>>>>>>> in
> >>>>>>>>>>>
> >>>>>>>>>>> the template.
> >>>>>>>>>>> - Users don't need to deal with concepts that are irrelevant
> >>>>>>>>>>> in
> >>>> their
> >>>>>>>>>>> concrete use case. Just start with the trivial, `Document`,
> >>>>>>>>>>> and
> >>>> later
> >>>>>>>>>>>
> >>>>>>>>>>> if
> >>>>>>>>>>>
> >>>>>>>>>>> the need arises, generalize to named documents, document
> >>>>>>>>>>> lists,
> >> or
> >>>>>>>>>>>
> >>>>>>>>>>> both.
> >>>>>>>>>>>
> >>>>>>>>>>> What do guys think?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best regards,
> >>>>>>>> Daniel Dekany
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best regards,
> >>>>>>> Daniel Dekany
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Daniel Dekany
> >>>>
> >>>>
> >>>
> >>> --
> >>> Best regards,
> >>> Daniel Dekany
> >>
> >>
> >>
> >
> > --
> > Best regards,
> > Daniel Dekany
>


-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Reply via email to