Re: freemarker-generator: Improving the input documents concept

Siegfried Goeschl Thu, 05 Mar 2020 14:14:02 -0800

Hi Daniel,

Please see my comment below


Thanks in advance, 

Siegfried Goeschl


> On 05.03.2020, at 22:36, Daniel Dekany <[email protected]> wrote:
> 
>> 
>> Regarding the "global mode" and "output generators files" - I'm sorry, but
>> I'm not getting it
> 
> 
> I'm not getting what doesn't go though. Can you explain?  The CLI suggested
> that you got "global mode" (a single --mode switch per run).

[SG] I think the confusion stems from different levels of abstractions (see 
next chapter) - while I try to get the command line invocation right you seem 
to think along a more technical implementation level.

Please have a look at 
https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 
<https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449> - I think 
such a "mode" might be needed but is not strictly relevant in the beginning. 
But it is an important concept fro the implementation ...

> 
> Do you think of defining explicit "output generator file" containing
>> `datasources, `templates` and `outputs` - yes that could be done but
>> does not feel like an interactive command line tool any longer
> 
> 
> I think what the CLI exposes and how should be a secondary detail at this
> phase, as the CLI is (or should be) just a front end, that wraps the common
> core (genertor.base). The CLI, the Maven task, Gradle task, etc. should
> probably just be thin wrappers around the common core. Do we agree on that?
> So, these concepts are "core" concepts, and probably govern the API of
> generator.base. That's was my intent here, to hammer out these core
> concepts.
> 
> Also the "output generator file" is usually just a data file, or just a
> template. It's just the file that causes some output generated. So,usually
> it doesn't *explicitly* contain all that information (though you might as
> well introduce a file type that does). But it still defines an output
> generator, because, you will have a template, a data-model, and an output
> file name.

[SG] If you think about the internal representation I fully agree with - I 
personally see something like a list of "Transformation" executed which 
contains the template, datasources and output 

> 
> I think you are leaning towards a 1.0 release why I favour 0.x.y to
>> have room to make mistakes / experiments
> 
> 
> The version number doesn't tell much to me, so what's your intent/strategy
> with these 0.x.y releases you plant to do? Like, if you release 0.1.0, then
> will you feel inconvenient to change things *radically* after that? That
> can be a problem, if the goal is iterating without bounds. On the other
> hand, if you don't feel inconvenient about that at all, I don't really see
> why a user would use it. But, if it's clearly indicated that everything can
> change, and you think it's useful to release that way, I don't want to be
> in your way.ng way 

[SG] What represents backward compatibility of CLI or Maven plugin? What can 
change?

* I don't want to change the command line parameters (CLI) and generator file 
layout (Maven) in a breaking way
* I want to avoid releasing things like "name:group" versus "group:name" when 
we have not settled on a decision
* What I still want to do in the near future is to change the public 
implementation classes since I do not assume that someone is using them for the 
time being

> 
> perfect is the enemy of good
> 
> 
> I just think the overall concept/architecture should be iterated out first.
> Polish, and adding all kind of bells, even fixing bugs, is different matter.

[SG] +1

> 
> On Thu, Mar 5, 2020 at 9:36 PM Siegfried Goeschl <
> [email protected]> wrote:
> 
>> Hi Daniel,
>> 
>> The introduction of named `Datasource` allows to simplify / streamline a
>> few things
>> 
>> * I have a meaningful user-supplied name
>> * I can pass additional configuration information as already implemented
>> with `charset` and `contenttype` and this would also allow configure a
>> `CSV Datasource`, e.g.
>> `users=./data/users.csv#format=default&header=true&delimeter=TAB` which
>> can be readily parses
>> * Currently the name of datasources are are taken from their relative
>> file name - might make sense to drop that but I need to contemplate :-)
>> 
>> Regarding the "global mode" and "output generators files" - I'm sorry,
>> but I'm not getting it
>> 
>> * I refined the
>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 to
>> make my points more clearly
>> * Do you think of defining explicit "output generator file" containing
>> `datasources, `templates` and `outputs` - yes that could be done but
>> does not feel like an interactive command line tool any longer
>> 
>> 
>> Regarding "more idiomatic FTL usage"
>> 
>> * Yes, I need to dive into custom template models or whatever it is
>> called :-)
>> 
>> 
>> Something we need to iron out is a release policy
>> 
>> * Currently we have little agreement how the CLI should look like or
>> behave
>> * I think you are leaning towards a 1.0 release why I favour 0.x.y to
>> have room to make mistakes / experiments
>> * I personally see the possibility that we don't get a release out -
>> "perfect is the enemy of good"
>> 
>> How would you like to handle the problem - can we agree on minimal
>> feature set worthy a release?
>> 
>> Thanks in advance,
>> 
>> Siegfried Goeschl
>> 
>> 
>> On 1 Mar 2020, at 11:33, Daniel Dekany wrote:
>> 
>>>> 
>>>> Actually not recommended but we have named data sources for less than
>>>> 24
>>>> hours
>>> 
>>> 
>>> Sorry, not sure what that means. Anyway, my "vote" is let's not give
>>> automatic names if that's not recommended to utilize. I mean, in case
>>> we
>>> happen to agree on that, why leave it there. Especially if
>>> automatically
>>> chosen names can clash with explicitly given ones, that would be a
>>> trouble.  (I'm not sure right now if they can... the path we use as
>>> the
>>> name can be realtive? Then it realistically can.)
>>> 
>>> This is a command line tool where we have little idea what the user
>>> will do
>>>> or abuse
>>> 
>>> 
>>> No matter how much/little we know, we firmly put our bets by releasing
>>> something. So if some feature is certainly not right, that's enough to
>>> not
>>> have it, I think.
>>> 
>>> How does a "data loader" knows that it is responsible to load a file
>>> 
>>> What should as "CSV data loader" should do - parse it into a list of
>>>> records or stream one by one?
>>> 
>>> 
>>> I think I was misunderstood here. It's not about some kind of
>>> auto-magic.
>>> It's about where do you specify what to load and how, and in what
>>> format do
>>> you specify that. Of course, you must specify the data source
>>> (basically an
>>> URI for now as I saw), the rough format (CSV), and the format options
>>> (separator character, etc.), and other freemarker-generator loading
>>> options
>>> (like which CSV columns are numbers, which are dates, with what
>>> format,
>>> what counts as null, etc.).
>>> 
>>> What was confusing in what I said much earlier is probably that you
>>> don't
>>> need a global "--mode". That just means that you can have multiple
>>> "modes"
>>> in the same run, not that you need some big auto-magic. And that they
>>> aren't really "modes" then... I think it's just natural that you can
>>> have
>>> different kind of "output generator" files in the same run. Why force
>>> the
>>> assumption that you don't, especially considering that they will might
>>> want
>>> to access common data (which you don't want to load again and again,
>>> for
>>> each run of the different --mode-s you need). Of course, as you might
>>> select files with wildcards (or by specifying a whole directory, or
>>> with
>>> some Maven matcher), you just can't directly associate the data loader
>>> options to the individual data sources. Instead you can say elsewhere
>>> that
>>> *.csv inside this explicit "group", or with this file name pattern, is
>>> to
>>> be loaded like this. That's what you might perceived as auto-magic.
>>> It's
>>> just mass-producing data loaders for "cattle" files.
>>> 
>>> How to handle the case if you have multiple potential data loaders for
>>> a
>>>> single file?
>>> 
>>> 
>>> As per above, that's just two data loaders referring to the same data
>>> source, so, nothing special.
>>> 
>>> As of the current state of things, this is how I'm supposed to load a
>>> CSV,
>>> in the template itself (if I'm not outdated/mistaken):
>>> 
>>> <#assign cvsFormat = CSVTool.formats.DEFAULT.withHeader()>
>>> <#assign foos = CSVTool.parse(Datasources.get("foos"),
>>> cvsFormat).records>
>>> <#assign bars = CSVTool.parse(Datasources.get("barb"),
>>> cvsFormat).records>
>>> 
>>> It will worth exploring how to make these look more "idiomatic" FTL
>>> (given
>>> this is an "official" FM product now, I think, we should show how it's
>>> done), and nicer in general. Point for now is, that's basically two
>>> data-loaders interwoven with the template there. Because they are
>>> interwoven like that, you can't reuse what they loaded for another
>>> template
>>> execution.
>>> 
>>> That's comes down to personal preferences, e.g. chown uses
>>> "owner[:group] "
>>> 
>>> 
>>> Yeah, but XML namespaces, Java, C, etc. all use
>>> <parent><operator><child>,
>>> so, I think, that clicks for more of our potential users. So let's bet
>>> on
>>> what clicks for more users.
>>> 
>>> Besides, I challenged the very idea that we need both groups and
>>> names. :)
>>> Saying that it's simpler and less opinioned (more flexible) to have
>>> just
>>> multiple names (like tags). What's the end of that?
>>> 
>>> On Sun, Mar 1, 2020 at 9:47 AM Siegfried Goeschl <
>>> [email protected]> wrote:
>>> 
>>>> HI Daniel,
>>>> 
>>>> Please see my comments below
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> Siegfried Goeschl
>>>> 
>>>> 
>>>>> On 29.02.2020, at 21:02, Daniel Dekany <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> I try to provide a useful name even when the content is coming from
>>>>>> an
>>>>>> URL
>>>>> 
>>>>> 
>>>>> When is it recommended to rely on that though? Because utilizing
>>>>> that
>>>> means
>>>>> that renaming a data source file can break the process, even if you
>>>>> call
>>>>> freemarker-cli with the up to date file name. And if that happens
>>>>> depends
>>>>> on what you (or an other random colleague!) have dug inside the
>>>> templates.
>>>>> So I guess we better just don't support this. Less code and less
>>>>> things
>>>> to
>>>>> document too.
>>>>> 
>>>> 
>>>> Actually not recommended but we have named data sources for less than
>>>> 24
>>>> hours
>>>> 
>>>>> 
>>>>>> I think we have a different understanding what a "Document" /
>>>> "Datasource
>>>>>> / DataSource" should do
>>>>> 
>>>>> 
>>>>> Thing is, eventually (most certainly pre-1.0, as it influences
>>>>> architecture), certain needs will have to addressed, somehow. Then
>>>>> we
>>>> will
>>>>> see what "things" we really need. For now I though we need "things"
>>>>> that
>>>>> are much more than paths, and encapsulate the "how to load the data"
>>>>> aspect. I called them data sources, but maybe we should called them
>>>>> "data
>>>>> loaders" to free up data sources for the more primitive thing. Some
>>>>> needs/doubts to address, *later*: Is it really the best approach for
>>>> users
>>>>> to load/parse data sources programmatically (that coded is written
>>>>> in
>>>> FTL,
>>>>> inside the templates)? Also, is the template the right place for
>>>>> doing
>>>>> that, because, when multiple templates (or just multiple template
>>>>> *runs*
>>>> of
>>>>> the same template, each generating a different output file) needs
>>>>> common
>>>>> data, they shouldn't load it again and again. Also, different topic,
>>>>> can
>>>> we
>>>>> handle the case "transparently" enough when the data is not coming
>>>>> from a
>>>>> file?
>>>> 
>>>> This is a command line tool where we have little idea what the user
>>>> will
>>>> do or abuse
>>>> 
>>>> * How does a "data loader" knows that it is responsible to load a
>>>> file
>>>> * What should as "CSV data loader" should do - parse it into a list
>>>> of
>>>> records or stream one by one?
>>>> * How to handle the case if you have multiple potential data loaders
>>>> for a
>>>> single file?
>>>> 
>>>> I'm leaning towards building blocks where the user controls the work
>>>> to be
>>>> done even it requires one to two extra lines of FTL code
>>>> 
>>>> 
>>>>> 
>>>>> The joy of programming - I did not intend to use "name:group"
>>>>> together
>>>> with
>>>>>> wildcards :-)
>>>>> 
>>>>> 
>>>>> For a CLI tool, I guess we agree that it should work. So maybe, like
>>>>> this
>>>>> (here logs and foos meant to be "groups"):
>>>>> --data-source logs file1.log file2.log fileN.log   --data-source
>>>>> foos
>>>>> foo1.csv foo2.csv fooN.csv  --data-source bar bar.xlsx
>>>>> 
>>>>> It so happens that here you don't really have a good control about
>>>>> the
>>>>> number of files associated to the name, so, maybe yet another reason
>>>>> to
>>>> not
>>>>> differentiate names and groups.
>>>>> 
>>>>> I Disagree here - I think using a name would be used more often. I
>>>>> added
>>>>>> the "group" as an afterthought since some grouping could be useful
>>>>> 
>>>>> 
>>>>> We do agree in that. What I said is that the *syntax* should be so
>>>>> that
>>>> the
>>>>> group comes first. It's still optional. Like this:
>>>>> --data-source group:name /somewhere
>>>>> --data-source name /somewhere
>>>> 
>>>> That's comes down to personal preferences, e.g. chown uses
>>>> "owner[:group] "
>>>> 
>>>>> 
>>>>> On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> HI Daniel,
>>>>>> 
>>>>>> Seem my comments below
>>>>>> 
>>>>>> Thanks in advance,
>>>>>> 
>>>>>> Siegfried Goeschl
>>>>>> 
>>>>>> 
>>>>>>> On 29.02.2020, at 19:08, Daniel Dekany <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>> FREEMARKER-135 freemarker-generator-cli: Support user-supplied
>>>>>>> names
>>>> for
>>>>>>> datasources
>>>>>>> 
>>>>>>> So, I can do this to have both a name an a group associated to a
>>>>>>> data
>>>>>>> source:
>>>>>>> --datasource someName:someGroup=somewhere/something
>>>>>> 
>>>>>> Correct
>>>>>> 
>>>>>>> Or if I only want a name, but not a group (or an ""  group
>>>>>>> actually -
>>>>>>> bug?), then:
>>>>>>> --datasource someName=somewhere/something
>>>>>> 
>>>>>> Correct
>>>>>> 
>>>>>>> 
>>>>>>> Or if only a group but not a name (or a "" name actually) then:
>>>>>>> --datasource :someGroup=somewhere/something
>>>>>> 
>>>>>> Mhmm, that would be unintended functionality from my side - current
>>>>>> approach is that every "Document" / "Datasource / DataSource" is
>>>>>> named
>>>>>> 
>>>>>>> 
>>>>>>> A name must identify exactly 1 data source, while a group
>>>>>>> identifies a
>>>>>> list
>>>>>>> of data sources.
>>>>>> 
>>>>>> No, every "Document" / "Datasource / DataSource" has a name
>>>>>> currently
>>>> but
>>>>>> uniqueness is not enforced. Only if you want to get a "Document" /
>>>>>> "Datasource / DataSource" with it's exact name I checked for
>>>>>> exactly one
>>>>>> search hit and throw an exception. I try to provide a useful name
>>>>>> even
>>>> when
>>>>>> the content is coming from an URL or STDIN (and I will probably add
>>>>>> environment variables as "Document" / "Datasource / DataSource",
>>>>>> e.g
>>>>>> configuration in the cloud as JSON content passed as environment
>>>> variable)
>>>>>> 
>>>>>>> 
>>>>>>> Is that this idea, that the a data source can be part of a group,
>>>>>>> and
>>>>>> then
>>>>>>> is also possibly identifiable with a name comes from an use case?
>>>>>>> I
>>>> mean,
>>>>>>> it's possibly important somewhere, but if so, then it's strange
>>>>>>> that
>>>> you
>>>>>>> can put something into only a single group. If we need this kind
>>>>>>> of
>>>>>> thing,
>>>>>>> then perhaps you should be just allowed to associate the data
>>>>>>> source
>>>>>> with a
>>>>>>> list of names (kind of like tagging), and then when the template
>>>>>>> wants
>>>> to
>>>>>>> get something by name, it will tell there if it expects exactly
>>>>>>> one or
>>>> a
>>>>>>> list of data sources. Then you don't need to introduce two terms
>>>>>>> in the
>>>>>>> documentation either (names and groups). Again, if we want this at
>>>>>>> all,
>>>>>>> instead of just going with a data source that itself gives a list.
>>>>>>> (And
>>>>>> if
>>>>>>> not, how will we handle a data source that loads from a non-file
>>>> source?)
>>>>>> 
>>>>>> I actually thought of implementing tagging but considered a "group"
>>>>>> sufficient.
>>>>>> 
>>>>>> * If you don't define anything everything goes into the "default"
>>>>>> group
>>>>>> * For individual documents you can define a name and an optional
>>>>>> group
>>>>>> 
>>>>>> I think we have a different understanding what a "Document" /
>>>> "Datasource
>>>>>> / DataSource" should do
>>>>>> 
>>>>>> * It is a dumb
>>>>>> * It is lazy since data is only loaded on demand
>>>>>> * There is no automagic like "oh, this is a JSON file, so let's go
>>>>>> to
>>>> the
>>>>>> JSON tool and create a map readily accessible in the data model"
>>>>>> 
>>>>>>> 
>>>>>>> Note that the current command line syntax doesn't work well with
>>>>>>> shell
>>>>>>> wildcard expansion. Like this:
>>>>>>> --datasource :someGroup=logs/*.log
>>>>>>> will try to expand ":someGroup=logs/*.log", and because it finds
>>>> nothing
>>>>>>> (and because the rules of sh and the like is a mess), you will get
>>>>>>> the
>>>>>>> parameter value as is, without * expanded.
>>>>>> 
>>>>>> The joy of programming - I did not intend to use "name:group"
>>>>>> together
>>>>>> with wildcards :-)
>>>>>> 
>>>>>>> 
>>>>>>> Also,  I think the syntax with colon should be flipped, because on
>>>> other
>>>>>>> places foo:bar usually means that foo is the bigger unit (the
>>>> container),
>>>>>>> and bar is the smaller unit (the child).
>>>>>> 
>>>>>> I Disagree here - I think using a name would be used more often. I
>>>>>> added
>>>>>> the "group" as an afterthought since some grouping could be useful
>>>>>> 
>>>>>>> 
>>>>>>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>>> Hi Daniel,
>>>>>>>> 
>>>>>>>> I'm an enterprise developer - bad habits die hard :-)
>>>>>>>> 
>>>>>>>> So I closed the following tickets and merged the branches
>>>>>>>> 
>>>>>>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli"
>>>>>>>> into
>>>>>>>> "freemarker-generator"
>>>>>>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
>>>>>> "Datasource"
>>>>>>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied
>>>> names
>>>>>>>> for datasources
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> Siegfried Goeschl
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 29.02.2020, at 12:19, Daniel Dekany <[email protected]>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Yeah, and of course, you can merge that branch. You can even
>>>>>>>>> work on
>>>>>> the
>>>>>>>>> master directly after all.
>>>>>>>>> 
>>>>>>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> But, I do recognize the cattle use case (several "faceless"
>>>>>>>>>> files
>>>> with
>>>>>>>>>> common format/schema). Only, my idea is to push that complexity
>>>>>>>>>> on
>>>> the
>>>>>>>> data
>>>>>>>>>> source. The "data source" concept shields the rest of the
>>>> application
>>>>>>>> from
>>>>>>>>>> the details of how the data is stored or retrieved. So, a data
>>>> source
>>>>>>>> might
>>>>>>>>>> loads a bunch of log files from a directory, and present them
>>>>>>>>>> as a
>>>>>>>> single
>>>>>>>>>> big table, or like a list of tables, etc. So I want to deal
>>>>>>>>>> with the
>>>>>>>> cattle
>>>>>>>>>> use case, but the question is what part of the of architecture
>>>>>>>>>> will
>>>>>> deal
>>>>>>>>>> with this complication, with other words, how do you box
>>>>>>>>>> things. Why
>>>>>> my
>>>>>>>>>> initial bet is to stuff that complication into the "data
>>>>>>>>>> source"
>>>>>>>>>> implementation(s) is that data sources are inherently varied.
>>>>>>>>>> Some
>>>>>>>> returns
>>>>>>>>>> a table-like thing, some have multiple named tables (worksheets
>>>>>>>>>> in
>>>>>>>> Excel),
>>>>>>>>>> some returns tree of nodes (XML), etc. So then, some might
>>>>>>>>>> returns a
>>>>>>>>>> list-of-list-of log records, or just a single list of
>>>>>>>>>> log-records
>>>> (put
>>>>>>>>>> together from daily log files). That way cattles don't add to
>>>>>> conceptual
>>>>>>>>>> complexity. Now, you might be aware of cases where the cattle
>>>> concept
>>>>>>>> must
>>>>>>>>>> be more exposed than this, and the we can't box things like
>>>>>>>>>> this.
>>>> But
>>>>>>>> this
>>>>>>>>>> is what I tried to express.
>>>>>>>>>> 
>>>>>>>>>> Regarding "output generators", and how that applies on the
>>>>>>>>>> command
>>>>>>>> line. I
>>>>>>>>>> think it's important that the common core between Maven and
>>>>>>>> command-line is
>>>>>>>>>> as fat as possible. Ideally, they are just two syntax to set up
>>>>>>>>>> the
>>>>>> same
>>>>>>>>>> thing. Mostly at least. So, if you specify a template file to
>>>>>>>>>> the
>>>> CLI
>>>>>>>>>> application, in a way so that it causes it to process that
>>>>>>>>>> template
>>>> to
>>>>>>>>>> generate a single output, then there you have just defined an
>>>> "output
>>>>>>>>>> generator" (even if it wasn't explicitly called like that in
>>>>>>>>>> the
>>>>>> command
>>>>>>>>>> line). If you specify 3 csv files to the CLI application, in a
>>>>>>>>>> way
>>>> so
>>>>>>>> that
>>>>>>>>>> it causes it to generate 3 output files, then you have just
>>>>>>>>>> defined
>>>> 3
>>>>>>>>>> "output generators" there (there's at least one template
>>>>>>>>>> specified
>>>>>> there
>>>>>>>>>> too, but that wasn't an "output generator" itself, it was just
>>>>>>>>>> an
>>>>>>>> attribute
>>>>>>>>>> of the 3 output generators). If you specify 1 template, and 3
>>>>>>>>>> csv
>>>>>>>> files, in
>>>>>>>>>> a way so that it will yield 4 output files (1 for the template,
>>>>>>>>>> 3
>>>> for
>>>>>>>> the
>>>>>>>>>> csv-s), then you have defined 4 output generators there. If you
>>>> have a
>>>>>>>> data
>>>>>>>>>> source that loads a list of 3 entities (say, 3 csv files, so
>>>>>>>>>> it's a
>>>>>>>> list of
>>>>>>>>>> tables then), and you have 2 templates, and you tell the CLI to
>>>>>> execute
>>>>>>>>>> each template for each item in said data source, then you have
>>>>>>>>>> just
>>>>>>>> defined
>>>>>>>>>> 6 "output generators".
>>>>>>>>>> 
>>>>>>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>> 
>>>>>>>>>>> That all depends on your mental model and work you do,
>>>> expectations,
>>>>>>>>>>> experience :-)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> __Document Handling__
>>>>>>>>>>> 
>>>>>>>>>>> *"But I think actually we have no good use case for list of
>>>> documents
>>>>>>>>>>> that's passed at once to a single template run, so, we can
>>>>>>>>>>> just
>>>>>> ignore
>>>>>>>>>>> that complication"*
>>>>>>>>>>> 
>>>>>>>>>>> In my case that's not a complication but my daily business -
>>>>>>>>>>> I'm
>>>>>>>>>>> regularly wading through access logs - yesterday probably a
>>>>>>>>>>> couple
>>>> of
>>>>>>>>>>> hundreds access logs across two staging sites to help tracking
>>>>>>>>>>> some
>>>>>>>>>>> strange API gateway issues :-)
>>>>>>>>>>> 
>>>>>>>>>>> My gut feeling is (borrowing from
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>>>>>>>>>> )
>>>>>>>>>>> 
>>>>>>>>>>> 1. You have a few lovely named documents / templates - `pets`
>>>>>>>>>>> 2. You have tons of anonymous documents / templates to process
>>>>>>>>>>> -
>>>>>>>>>>> `cattle`
>>>>>>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>>>>>>>>>> 
>>>>>>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover
>>>>>>>>>>> 1)
>>>>>> since
>>>>>>>>>>> it is equally important and common.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> __Template And Document Processing Modes__
>>>>>>>>>>> 
>>>>>>>>>>> IMHO it is important to answer the following question : "How
>>>>>>>>>>> many
>>>>>>>>>>> outputs do you get when rendering 2 template and 3
>>>>>>>>>>> datasources?
>>>> Two,
>>>>>>>>>>> Three or Six?"
>>>>>>>>>>> 
>>>>>>>>>>> Your answer is influenced by your mental model / experience
>>>>>>>>>>> 
>>>>>>>>>>> * When wading through tons of CSV files, access logs, etc. the
>>>> answer
>>>>>>>> is
>>>>>>>>>>> "2"
>>>>>>>>>>> * When doing source code generation the obvious answer is "6"
>>>>>>>>>>> * Can't image a use case which results in "3" but I'm pretty
>>>>>>>>>>> sure
>>>> we
>>>>>>>>>>> will encounter one
>>>>>>>>>>> 
>>>>>>>>>>> __Template and document mode probably shouldn't exist__
>>>>>>>>>>> 
>>>>>>>>>>> That's hard for me to fully understand - I definitely lack
>>>>>>>>>>> your
>>>>>>>> insights
>>>>>>>>>>> & experience writing such tools :-)
>>>>>>>>>>> 
>>>>>>>>>>> Defining the `Output Generator` is the underlying model for
>>>>>>>>>>> the
>>>> Maven
>>>>>>>>>>> plugin (and probably FMPP).
>>>>>>>>>>> 
>>>>>>>>>>> I'm not sure if this applies for command lines at least not in
>>>>>>>>>>> the
>>>>>> way
>>>>>>>> I
>>>>>>>>>>> use them (or would like to use them)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>> 
>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>> 
>>>>>>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Yeah, "data source" is surely a too popular name, but for
>>>>>>>>>>>> reason.
>>>>>>>>>>>> Anyone
>>>>>>>>>>>> has other ideas?
>>>>>>>>>>>> 
>>>>>>>>>>>> As of naming data sources and such. One thing I was wondering
>>>> about
>>>>>>>>>>>> back
>>>>>>>>>>>> then is how to deal with list of documents given to a
>>>>>>>>>>>> template,
>>>>>> versus
>>>>>>>>>>>> exactly 1 document given to a template. But I think actually
>>>>>>>>>>>> we
>>>> have
>>>>>>>>>>>> no
>>>>>>>>>>>> good use case for list of documents that's passed at once to
>>>>>>>>>>>> a
>>>>>> single
>>>>>>>>>>>> template run, so, we can just ignore that complication. A
>>>>>>>>>>>> document
>>>>>> has
>>>>>>>>>>>> a
>>>>>>>>>>>> name, and that's always just a single document, not a
>>>>>>>>>>>> collection,
>>>> as
>>>>>>>>>>>> far as
>>>>>>>>>>>> the template is concerned. (We can have multiple documents
>>>>>>>>>>>> per
>>>> run,
>>>>>>>>>>>> but
>>>>>>>>>>>> those normally yield separate output generators, so it's
>>>>>>>>>>>> still
>>>> only
>>>>>>>>>>>> one
>>>>>>>>>>>> document per template.) However, we can have data source
>>>>>>>>>>>> types
>>>>>>>>>>>> (document
>>>>>>>>>>>> types with old terminology) that collect together multiple
>>>>>>>>>>>> data
>>>>>> files.
>>>>>>>>>>>> So
>>>>>>>>>>>> then that complexity is encapsulated into the data source
>>>>>>>>>>>> type,
>>>> and
>>>>>>>>>>>> doesn't
>>>>>>>>>>>> complicate the overall architecture. That's another case when
>>>>>>>>>>>> a
>>>> data
>>>>>>>>>>>> source
>>>>>>>>>>>> is not just a file. Like maybe there's a data source type
>>>>>>>>>>>> that
>>>> loads
>>>>>>>>>>>> all
>>>>>>>>>>>> the CSV-s from a directory, into a single big table (I had
>>>>>>>>>>>> such
>>>>>> case),
>>>>>>>>>>>> or
>>>>>>>>>>>> even into a list of tables. Or, as I mentioned already, a
>>>>>>>>>>>> data
>>>>>> source
>>>>>>>>>>>> is
>>>>>>>>>>>> maybe an SQL query on a JDBC data source (and we got the
>>>>>>>>>>>> first
>>>> term
>>>>>>>>>>>> clash... JDBC also call them data sources).
>>>>>>>>>>>> 
>>>>>>>>>>>> Template and document mode probably shouldn't exist from user
>>>>>>>>>>>> perspective
>>>>>>>>>>>> either, at least not as a global option that must apply to
>>>>>> everything
>>>>>>>>>>>> in a
>>>>>>>>>>>> run. They could just give the files that define the "output
>>>>>>>>>>>> generators",
>>>>>>>>>>>> and some of them will be templates, some of them are data
>>>>>>>>>>>> files,
>>>> in
>>>>>>>>>>>> which
>>>>>>>>>>>> case a template need to be associated with them (and there
>>>>>>>>>>>> can be
>>>> a
>>>>>>>>>>>> couple
>>>>>>>>>>>> of ways of doing that). And then again, there are the cases
>>>>>>>>>>>> where
>>>>>> you
>>>>>>>>>>>> want
>>>>>>>>>>>> to create one output generator per entity from some data
>>>>>>>>>>>> source.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> See my comments below - and thanks for your patience and
>>>>>>>>>>>>> input
>>>> :-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *Renaming Document To DataSource*
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, makes sense. I tried to avoid since I'm using
>>>> javax.activation
>>>>>>>>>>>>> and
>>>>>>>>>>>>> its DataSource.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *Template And Document Mode*
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Agreed - I think it is a valuable abstraction for the user
>>>>>>>>>>>>> but it
>>>>>> is
>>>>>>>>>>>>> not
>>>>>>>>>>>>> an implementation concept :-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *Document Without Symbolic Names*
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also agreed and it is going to change but I have not settled
>>>>>>>>>>>>> my
>>>>>> mind
>>>>>>>>>>>>> yet
>>>>>>>>>>>>> what exactly to implement.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> A few quick thoughts on that:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - We should replace the "document" term with something more
>>>>>> speaking.
>>>>>>>>>>>>> It
>>>>>>>>>>>>> doesn't tell that it's some kind of input. Also, most of
>>>>>>>>>>>>> these
>>>>>> inputs
>>>>>>>>>>>>> aren't something that people typically call documents. Like
>>>>>>>>>>>>> a csv
>>>>>>>>>>>>> file, or
>>>>>>>>>>>>> a database table, which is not even a file (OK we don't
>>>>>>>>>>>>> support
>>>>>> such
>>>>>>>>>>>>> thing
>>>>>>>>>>>>> at the moment). I think, maybe "data source" is a safe
>>>>>>>>>>>>> enough
>>>> term.
>>>>>>>>>>>>> (It
>>>>>>>>>>>>> also rhymes with data model.)
>>>>>>>>>>>>> - You have separate "template" and "document" "mode", that
>>>> applies
>>>>>> to
>>>>>>>>>>>>> a
>>>>>>>>>>>>> whole run. I think such specialization won't be helpful. We
>>>>>>>>>>>>> could
>>>>>>>>>>>>> just say,
>>>>>>>>>>>>> on the conceptual level at lest, that we need a set of
>>>>>>>>>>>>> "outputs
>>>>>>>>>>>>> generators". An output generator is an object (in the API)
>>>>>>>>>>>>> that
>>>>>>>>>>>>> specifies a
>>>>>>>>>>>>> template, a data-model (where the data-model is possibly
>>>> populated
>>>>>>>>>>>>> with
>>>>>>>>>>>>> "documents"), and an output "sink" (a file path, or stdout),
>>>>>>>>>>>>> and
>>>>>> can
>>>>>>>>>>>>> generate the output itself. A practical way of defining the
>>>> output
>>>>>>>>>>>>> generators in a CLI application is via a bunch of files,
>>>>>>>>>>>>> each
>>>>>>>>>>>>> defining an
>>>>>>>>>>>>> output generator. Some of those files is maybe a template
>>>>>>>>>>>>> (that
>>>> you
>>>>>>>>>>>>> can
>>>>>>>>>>>>> even detect from the file extension), or a data file that we
>>>>>>>>>>>>> currently call
>>>>>>>>>>>>> a "document". They could freely mix inside the same run. I
>>>>>>>>>>>>> have
>>>>>> also
>>>>>>>>>>>>> met
>>>>>>>>>>>>> use case when you have a single table (single "document"),
>>>>>>>>>>>>> and
>>>> each
>>>>>>>>>>>>> record
>>>>>>>>>>>>> in it yields an output file. That can also be described in
>>>>>>>>>>>>> some
>>>>>> file
>>>>>>>>>>>>> format, or really in any other way, like directly in command
>>>>>>>>>>>>> line
>>>>>>>>>>>>> argument,
>>>>>>>>>>>>> via API, etc.
>>>>>>>>>>>>> - You have multiple documents without associated symbolical
>>>>>>>>>>>>> name
>>>> in
>>>>>>>>>>>>> some
>>>>>>>>>>>>> examples. Templates can't identify those then in a well
>>>>>> maintainable
>>>>>>>>>>>>> way.
>>>>>>>>>>>>> The actual file name is often not a good identifier, can
>>>>>>>>>>>>> change
>>>>>> over
>>>>>>>>>>>>> time,
>>>>>>>>>>>>> and you might don't even have good control over it, like you
>>>>>> already
>>>>>>>>>>>>> receive it as a parameter from somewhere else, or someone
>>>>>>>>>>>>> moves/renames
>>>>>>>>>>>>> that files that you need to read. Index is also not very
>>>>>>>>>>>>> good,
>>>> but
>>>>>> I
>>>>>>>>>>>>> have
>>>>>>>>>>>>> written about that earlier.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> still wrapping my side around but assembled some thoughts
>>>>>>>>>>>>> here -
>>>>>>>>>>>>> 
>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]>
>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What you are describing is more like the angle that FMPP
>>>>>>>>>>>>> took
>>>>>>>>>>>>> initially,
>>>>>>>>>>>>> where templates drive things, they generate the output for
>>>>>> themselves
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (even
>>>>>>>>>>>>> 
>>>>>>>>>>>>> multiple output files if they wish). By default output files
>>>>>>>>>>>>> name
>>>>>>>>>>>>> (and
>>>>>>>>>>>>> relative path) is deduced from template name. There was also
>>>>>>>>>>>>> a
>>>>>> global
>>>>>>>>>>>>> data-model, built in a configuration file (or equally, built
>>>>>>>>>>>>> via
>>>>>>>>>>>>> command
>>>>>>>>>>>>> line arguments, or both mixed), from which templates get
>>>>>>>>>>>>> whatever
>>>>>>>>>>>>> data
>>>>>>>>>>>>> 
>>>>>>>>>>>>> they
>>>>>>>>>>>>> 
>>>>>>>>>>>>> are interested in. Take a look at the figures here:
>>>>>>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept
>>>>>>>>>>>>> was
>>>>>>>>>>>>> 
>>>>>>>>>>>>> generalized
>>>>>>>>>>>>> 
>>>>>>>>>>>>> a bit more, because you could add XML files at the same
>>>>>>>>>>>>> place
>>>> where
>>>>>>>>>>>>> you
>>>>>>>>>>>>> have the templates, and then you could associate transform
>>>>>> templates
>>>>>>>>>>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> XML files (based on path pattern and/or the XML document
>>>> element).
>>>>>>>>>>>>> Now
>>>>>>>>>>>>> that's like what freemarker-generator had initially (data
>>>>>>>>>>>>> files
>>>>>> drive
>>>>>>>>>>>>> output, and the template is there to transform it).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So I think the generic mental model would like this:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. You got files that drive the process, let's call them
>>>> *generator
>>>>>>>>>>>>> files* for now. Usually, each generator file yields an
>>>>>>>>>>>>> output
>>>> file
>>>>>>>>>>>>> (but
>>>>>>>>>>>>> maybe even multiple output files, as you might saw in the
>>>>>>>>>>>>> last
>>>>>>>>>>>>> figure).
>>>>>>>>>>>>> These generator files can be of many types, like XML, JSON,
>>>>>>>>>>>>> XLSX
>>>>>> (as
>>>>>>>>>>>>> 
>>>>>>>>>>>>> in the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> original freemarker-generator), and even templates (as is
>>>>>>>>>>>>> the
>>>> norm
>>>>>> in
>>>>>>>>>>>>> FMPP). If the file is not a template, then you got a set of
>>>>>>>>>>>>> transformer
>>>>>>>>>>>>> templates (-t CLI option) in a separate directory, which can
>>>>>>>>>>>>> be
>>>>>>>>>>>>> 
>>>>>>>>>>>>> associated
>>>>>>>>>>>>> 
>>>>>>>>>>>>> with the generator files base on name patterns, and even
>>>>>>>>>>>>> based on
>>>>>>>>>>>>> 
>>>>>>>>>>>>> content
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (schema usually). If the generator file is a template (so
>>>>>>>>>>>>> that's
>>>> a
>>>>>>>>>>>>> positional @Parameter CLI argument that happens to be an
>>>>>>>>>>>>> *.ftl,
>>>> and
>>>>>>>>>>>>> is
>>>>>>>>>>>>> 
>>>>>>>>>>>>> not
>>>>>>>>>>>>> 
>>>>>>>>>>>>> a template file specified after the "-t" option), then you
>>>>>>>>>>>>> just
>>>>>>>>>>>>> Template.process(...) it, and it prints what the output will
>>>>>>>>>>>>> be.
>>>>>>>>>>>>> 2. You also have a set of variables, the global data-model,
>>>>>>>>>>>>> that
>>>>>>>>>>>>> contains commonly useful stuff, like what you now call
>>>>>>>>>>>>> parameters
>>>>>>>>>>>>> (CLI
>>>>>>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML,
>>>>>>>>>>>>> etc..
>>>>>> Those
>>>>>>>>>>>>> 
>>>>>>>>>>>>> data
>>>>>>>>>>>>> 
>>>>>>>>>>>>> files aren't "generator files". Templates just use them if
>>>>>>>>>>>>> they
>>>>>> need
>>>>>>>>>>>>> 
>>>>>>>>>>>>> them.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> An important thing here is to reuse the same mechanism to
>>>>>>>>>>>>> read
>>>> and
>>>>>>>>>>>>> 
>>>>>>>>>>>>> parse
>>>>>>>>>>>>> 
>>>>>>>>>>>>> those data files, which was used in templates when
>>>>>>>>>>>>> transforming
>>>>>>>>>>>>> 
>>>>>>>>>>>>> generator
>>>>>>>>>>>>> 
>>>>>>>>>>>>> files. So we need a common format for specifying how to load
>>>>>>>>>>>>> data
>>>>>>>>>>>>> 
>>>>>>>>>>>>> files.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That's maybe just FTL that #assigns to the variables, or
>>>>>>>>>>>>> maybe
>>>> more
>>>>>>>>>>>>> declarative format.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What I have described in the original post here was a less
>>>> generic
>>>>>>>>>>>>> form
>>>>>>>>>>>>> 
>>>>>>>>>>>>> of
>>>>>>>>>>>>> 
>>>>>>>>>>>>> this, as I tried to be true with the original approach. I
>>>>>>>>>>>>> though
>>>>>> the
>>>>>>>>>>>>> proposal will be drastic enough as it is... :) There, the
>>>>>>>>>>>>> "main"
>>>>>>>>>>>>> document
>>>>>>>>>>>>> is the "generator file" from point 1, the "-t" template is
>>>>>>>>>>>>> the
>>>>>>>>>>>>> transform
>>>>>>>>>>>>> template for the "main" document, and the other named
>>>>>>>>>>>>> documents
>>>>>>>>>>>>> ("users",
>>>>>>>>>>>>> "groups") is a poor man's shared data-model from point 2
>>>> (together
>>>>>>>>>>>>> with
>>>>>>>>>>>>> with -PName=value).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There's further somewhat confusing thing to get right with
>>>>>>>>>>>>> the
>>>>>>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`)
>>>>>>>>>>>>> thing
>>>>>> though.
>>>>>>>>>>>>> In
>>>>>>>>>>>>> the model above, as per point 1, if you list multiple data
>>>>>>>>>>>>> files,
>>>>>>>>>>>>> each
>>>>>>>>>>>>> 
>>>>>>>>>>>>> will
>>>>>>>>>>>>> 
>>>>>>>>>>>>> generate a separate output file. So, if you need take in a
>>>>>>>>>>>>> list
>>>> of
>>>>>>>>>>>>> files
>>>>>>>>>>>>> 
>>>>>>>>>>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> transform it to a single output file (or at least with a
>>>>>>>>>>>>> single
>>>>>>>>>>>>> transform
>>>>>>>>>>>>> template execution), then you have to be explicit about
>>>>>>>>>>>>> that, as
>>>>>>>>>>>>> that's
>>>>>>>>>>>>> 
>>>>>>>>>>>>> not
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the default behavior anymore. But it's still absolutely
>>>>>>>>>>>>> possible.
>>>>>>>>>>>>> Imagine
>>>>>>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You
>>>>>>>>>>>>> need
>>>>>> some
>>>>>>>>>>>>> CLI
>>>>>>>>>>>>> (and Maven config, etc.) syntax to express that, but that
>>>> shouldn't
>>>>>>>>>>>>> be a
>>>>>>>>>>>>> big deal.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Good timing - I was looking at a similar problem from
>>>>>>>>>>>>> different
>>>>>> angle
>>>>>>>>>>>>> yesterday (see below)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Don't have enough time to answer your email in detail now -
>>>>>>>>>>>>> will
>>>> do
>>>>>>>>>>>>> that
>>>>>>>>>>>>> tomorrow evening
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ===. START
>>>>>>>>>>>>> # FreeMarker CLI Improvement
>>>>>>>>>>>>> ## Support Of Multiple Template Files
>>>>>>>>>>>>> Currently we support the following combinations
>>>>>>>>>>>>> 
>>>>>>>>>>>>> * Single template and no data files
>>>>>>>>>>>>> * Single template and one or more data files
>>>>>>>>>>>>> 
>>>>>>>>>>>>> But we can not support the following use case which is quite
>>>>>> typical
>>>>>>>>>>>>> in
>>>>>>>>>>>>> the cloud
>>>>>>>>>>>>> 
>>>>>>>>>>>>> __Convert multiple templates with a single data file, e.g
>>>> copying a
>>>>>>>>>>>>> directory of configuration files using a JSON configuration
>>>> file__
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ## Implementation notes
>>>>>>>>>>>>> * When we copy a directory we can remove the `ftl`extension
>>>>>>>>>>>>> on
>>>> the
>>>>>>>>>>>>> fly
>>>>>>>>>>>>> * We might need an `exclude` filter for the copy operation
>>>>>>>>>>>>> * Initially resolve to a list of template files and process
>>>>>>>>>>>>> one
>>>>>> after
>>>>>>>>>>>>> another
>>>>>>>>>>>>> * Need to calculate the output file location and extension
>>>>>>>>>>>>> * We need to rename the existing command line parameters
>>>>>>>>>>>>> (see
>>>>>> below)
>>>>>>>>>>>>> * Do we need multiple include and exclude filter?
>>>>>>>>>>>>> * Do we need file versus directory filters?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ### Command Line Options
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> --input-encoding : Encoding of the documents
>>>>>>>>>>>>> --output-encoding : Encoding of the rendered template
>>>>>>>>>>>>> --template-encoding : Encoding of the template
>>>>>>>>>>>>> --output : Output file or directory
>>>>>>>>>>>>> --include-document : Include pattern for documents
>>>>>>>>>>>>> --exclude-document : Exclude pattern for documents
>>>>>>>>>>>>> --include-template: Include pattern for templates
>>>>>>>>>>>>> --exclude-template : Exclude pattern for templates
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ### Command Line Examples
>>>>>>>>>>>>> ```text
>>>>>>>>>>>>> # Copy all FTL templates found in "ext/config" to the
>>>>>>>>>>>>> "/config"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> directory
>>>>>>>>>>>>> 
>>>>>>>>>>>>> using the data from "config.json"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o
>>>> /config
>>>>>>>>>>>>> 
>>>>>>>>>>>>> config.json
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
>>>>>>>>>>>>> *.ftl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --output
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /config config.json
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Bascically the same using a named document "configuration"
>>>>>>>>>>>>> # It might make sense to expose "conf" directly in the
>>>>>>>>>>>>> FreeMarker
>>>>>>>>>>>>> data
>>>>>>>>>>>>> model
>>>>>>>>>>>>> # It might make sens to allow URIs for loading documents
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>>>>>>>>>> 
>>>>>>>>>>>>> configuration=config.json
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
>>>>>>>>>>>>> *.ftl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --output
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /config --document configuration=config.json
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
>>>>>>>>>>>>> *.ftl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --output
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /config --document configuration=file:///config.json
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Bascically the same using an environment variable as named
>>>>>> document
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o
>>>> /config
>>>>>> -d
>>>>>>>>>>>>> 
>>>>>>>>>>>>> configuration=env:///CONFIGURATION
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
>>>>>>>>>>>>> *.ftl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --output
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /config --document configuration=env:///CONFIGURATION
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> === END
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]>
>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Input documents is a fundamental concept in
>>>>>>>>>>>>> freemarker-generator,
>>>>>> so
>>>>>>>>>>>>> we
>>>>>>>>>>>>> should think about that more, and probably refine/rework how
>>>>>>>>>>>>> it's
>>>>>>>>>>>>> done.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Currently it works like this, with CLI at least.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Then in access-report.ftl you have to do something like
>>>>>>>>>>>>> this:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> <#assign doc = Documents.get(0)>
>>>>>>>>>>>>> ... process doc here
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that
>>>>>>>>>>>>> lead
>>>>>> to a
>>>>>>>>>>>>> 
>>>>>>>>>>>>> funny
>>>>>>>>>>>>> 
>>>>>>>>>>>>> chain of coincidences: It returned the string "D", then
>>>>>>>>>>>>> 
>>>>>>>>>>>>> CSVTool.parse(...)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> happily parsed that to a table with the single column "D",
>>>>>>>>>>>>> and 0
>>>>>>>>>>>>> rows,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> 
>>>>>>>>>>>>> as there were 0 rows, the template didn't run into an error
>>>> because
>>>>>>>>>>>>> row.myExpectedColumn refers to a missing column either, so
>>>>>>>>>>>>> the
>>>>>>>>>>>>> process
>>>>>>>>>>>>> finished with success. (: Pretty unlucky for sure. The root
>>>>>>>>>>>>> was
>>>>>>>>>>>>> unintentionally breaking a FreeMarker idiom though;
>>>>>>>>>>>>> eventually we
>>>>>>>>>>>>> will
>>>>>>>>>>>>> 
>>>>>>>>>>>>> have
>>>>>>>>>>>>> 
>>>>>>>>>>>>> to work on those too, but, different topic.)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> However, actually multiple input documents can be passed in:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Above template will still work, though then you ignored all
>>>>>>>>>>>>> but
>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> first
>>>>>>>>>>>>> 
>>>>>>>>>>>>> document. So if you expect any number of input documents,
>>>>>>>>>>>>> you
>>>>>>>>>>>>> probably
>>>>>>>>>>>>> 
>>>>>>>>>>>>> will
>>>>>>>>>>>>> 
>>>>>>>>>>>>> have to do this:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> <#list Documents.list as doc>
>>>>>>>>>>>>> ... process doc here
>>>>>>>>>>>>> </#list>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but
>>>> again,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> those
>>>>>>>>>>>>> 
>>>>>>>>>>>>> we will work out in a different thread.)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So, what would be better, in my opinion. I start out from
>>>>>>>>>>>>> what I
>>>>>>>>>>>>> think
>>>>>>>>>>>>> 
>>>>>>>>>>>>> are
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the common uses cases, in decreasing order of frequency.
>>>>>>>>>>>>> Goal is
>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> make
>>>>>>>>>>>>> 
>>>>>>>>>>>>> those less error prone for the users, and simpler to
>>>>>>>>>>>>> express.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> USE CASE 1
>>>>>>>>>>>>> 
>>>>>>>>>>>>> You have exactly 1 input documents, which is therefore
>>>>>>>>>>>>> simply
>>>> "the"
>>>>>>>>>>>>> document in the mind of the user. This is probably the
>>>>>>>>>>>>> typical
>>>> use
>>>>>>>>>>>>> 
>>>>>>>>>>>>> case,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> but at least the use case users typically start out from
>>>>>>>>>>>>> when
>>>>>>>>>>>>> starting
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> work.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Then `Documents.get(0)` is not very fitting. Most
>>>>>>>>>>>>> importantly
>>>> it's
>>>>>>>>>>>>> 
>>>>>>>>>>>>> error
>>>>>>>>>>>>> 
>>>>>>>>>>>>> prone, because if the user passed in more than 1 documents
>>>>>>>>>>>>> (can
>>>>>> even
>>>>>>>>>>>>> 
>>>>>>>>>>>>> happen
>>>>>>>>>>>>> 
>>>>>>>>>>>>> totally accidentally, like if the user was lazy and used a
>>>> wildcard
>>>>>>>>>>>>> 
>>>>>>>>>>>>> that
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the shell exploded), the template will silently ignore the
>>>>>>>>>>>>> rest
>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>> documents, and the singe document processed will be
>>>>>>>>>>>>> practically
>>>>>>>>>>>>> picked
>>>>>>>>>>>>> randomly. The user might won't notice that and submits a bad
>>>> report
>>>>>>>>>>>>> or
>>>>>>>>>>>>> 
>>>>>>>>>>>>> such.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think that in this use case the document should be simply
>>>>>> referred
>>>>>>>>>>>>> as
>>>>>>>>>>>>> `Document` in the template. When you have multiple documents
>>>> there,
>>>>>>>>>>>>> referring to `Document` should be an error, saying that the
>>>>>> template
>>>>>>>>>>>>> 
>>>>>>>>>>>>> was
>>>>>>>>>>>>> 
>>>>>>>>>>>>> made to process a single document only.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> USE CASE 2
>>>>>>>>>>>>> 
>>>>>>>>>>>>> You have multiple input documents, but each has different
>>>>>>>>>>>>> role
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (different
>>>>>>>>>>>>> 
>>>>>>>>>>>>> schema, maybe different file type). Like, you pass in
>>>>>>>>>>>>> users.csv
>>>> and
>>>>>>>>>>>>> groups.csv. Each has difference schema, and so you want to
>>>>>>>>>>>>> access
>>>>>>>>>>>>> them
>>>>>>>>>>>>> differently, but in the same template.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> [...]
>>>>>>>>>>>>> --named-document users somewhere/foo-users.csv
>>>>>>>>>>>>> --named-document groups somewhere/foo-groups.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Then in the template you could refer to them as:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> `NamedDocuments.users`,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> and `NamedDocuments.groups`.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept,
>>>>>>>>>>>>> where
>>>>>>>>>>>>> 
>>>>>>>>>>>>> `Document`
>>>>>>>>>>>>> 
>>>>>>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called
>>>>>>>>>>>>> "main"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> because
>>>>>>>>>>>>> 
>>>>>>>>>>>>> that's "the" document the template is about, but then you
>>>>>>>>>>>>> have to
>>>>>>>>>>>>> added
>>>>>>>>>>>>> some helper documents, with symbolic names representing
>>>>>>>>>>>>> their
>>>> role.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>>>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here, `Document` still works in the template, and it refers
>>>>>>>>>>>>> to
>>>>>>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting
>>>>>> --document-name=main
>>>>>>>>>>>>> 
>>>>>>>>>>>>> above
>>>>>>>>>>>>> 
>>>>>>>>>>>>> would be cleaner, I couldn't figure out how to do that with
>>>>>> Picocli.
>>>>>>>>>>>>> Anyway, for now the point is the concept, which is not
>>>>>>>>>>>>> specific
>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> CLI.)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> USE CASE 3
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here you have several of the same kind of documents. That
>>>>>>>>>>>>> has a
>>>>>> more
>>>>>>>>>>>>> generic sub-use-case, when you have explicitly named
>>>>>>>>>>>>> documents
>>>>>> (like
>>>>>>>>>>>>> "users" above), and for some you expect multiple input
>>>>>>>>>>>>> files.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>>>>>>> somewhere/bar-users.csv
>>>>>>>>>>>>> --document-name=groups somewhere/global-groups.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The template must to be written with this use case in mind,
>>>>>>>>>>>>> as
>>>> now
>>>>>> it
>>>>>>>>>>>>> 
>>>>>>>>>>>>> has
>>>>>>>>>>>>> 
>>>>>>>>>>>>> #list some of the documents. (I think in practice you hardly
>>>>>>>>>>>>> ever
>>>>>>>>>>>>> want
>>>>>>>>>>>>> 
>>>>>>>>>>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> get a document by hard coded index. Either you don't know
>>>>>>>>>>>>> how
>>>> many
>>>>>>>>>>>>> documents you have, so you can't use hard coded indexes, or
>>>>>>>>>>>>> you
>>>> do,
>>>>>>>>>>>>> and
>>>>>>>>>>>>> each index has a specific meaning, but then you should name
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> documents
>>>>>>>>>>>>> 
>>>>>>>>>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>>>>>>>>>> Accessing that list of documents in the template, maybe
>>>>>>>>>>>>> could be
>>>>>> done
>>>>>>>>>>>>> 
>>>>>>>>>>>>> like
>>>>>>>>>>>>> 
>>>>>>>>>>>>> this:
>>>>>>>>>>>>> - For the "main" documents: `DocumentList`
>>>>>>>>>>>>> - For explicitly named documents, like "users":
>>>>>>>>>>>>> 
>>>>>>>>>>>>> `NamedDocumentLists.users`
>>>>>>>>>>>>> 
>>>>>>>>>>>>> SUMMING UP
>>>>>>>>>>>>> 
>>>>>>>>>>>>> To unify all 3 use cases into a coherent concept:
>>>>>>>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and
>>>>>>>>>>>>> while
>>>>>> you
>>>>>>>>>>>>> 
>>>>>>>>>>>>> can
>>>>>>>>>>>>> 
>>>>>>>>>>>>> achieve everything with it, using it requires your template
>>>>>>>>>>>>> to
>>>>>> handle
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> most generic case too. So, I think it would be rarely used.
>>>>>>>>>>>>> - `DocumentList` is just a shorthand for
>>>> `NamedDocumentLists.main`.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It's
>>>>>>>>>>>>> 
>>>>>>>>>>>>> used if you only have one kind of documents (single format
>>>>>>>>>>>>> and
>>>>>>>>>>>>> schema),
>>>>>>>>>>>>> 
>>>>>>>>>>>>> but
>>>>>>>>>>>>> 
>>>>>>>>>>>>> potentially multiple of them.
>>>>>>>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly
>>>>>>>>>>>>> 1
>>>>>>>>>>>>> document
>>>>>>>>>>>>> 
>>>>>>>>>>>>> of
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the given name.
>>>>>>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`.
>>>>>>>>>>>>> This
>>>> is
>>>>>>>>>>>>> for
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> most natural/frequent use case.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That's 4 possible ways of accessing your documents, which is
>>>>>>>>>>>>> a
>>>>>>>>>>>>> 
>>>>>>>>>>>>> trade-off
>>>>>>>>>>>>> 
>>>>>>>>>>>>> for the sake of these:
>>>>>>>>>>>>> - Catching CLI (or Maven, etc.) input where the template
>>>>>>>>>>>>> output
>>>>>>>>>>>>> likely
>>>>>>>>>>>>> 
>>>>>>>>>>>>> will
>>>>>>>>>>>>> 
>>>>>>>>>>>>> be wrong. That's only possible if the user can communicate
>>>>>>>>>>>>> its
>>>>>> intent
>>>>>>>>>>>>> 
>>>>>>>>>>>>> in
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the template.
>>>>>>>>>>>>> - Users don't need to deal with concepts that are irrelevant
>>>>>>>>>>>>> in
>>>>>> their
>>>>>>>>>>>>> concrete use case. Just start with the trivial, `Document`,
>>>>>>>>>>>>> and
>>>>>> later
>>>>>>>>>>>>> 
>>>>>>>>>>>>> if
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the need arises, generalize to named documents, document
>>>>>>>>>>>>> lists,
>>>> or
>>>>>>>>>>>>> 
>>>>>>>>>>>>> both.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What do guys think?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Best regards,
>>>>>>>>>> Daniel Dekany
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>> Daniel Dekany
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Daniel Dekany
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Daniel Dekany
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Best regards,
>>> Daniel Dekany
>> 
> 
> 
> -- 
> Best regards,
> Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Reply via email to