HI Daniel, Seem my comments below
Thanks in advance, Siegfried Goeschl > On 29.02.2020, at 19:08, Daniel Dekany <[email protected]> wrote: > > FREEMARKER-135 freemarker-generator-cli: Support user-supplied names for > datasources > > So, I can do this to have both a name an a group associated to a data > source: > --datasource someName:someGroup=somewhere/something Correct > Or if I only want a name, but not a group (or an "" group actually - > bug?), then: > --datasource someName=somewhere/something Correct > > Or if only a group but not a name (or a "" name actually) then: > --datasource :someGroup=somewhere/something Mhmm, that would be unintended functionality from my side - current approach is that every "Document" / "Datasource / DataSource" is named > > A name must identify exactly 1 data source, while a group identifies a list > of data sources. No, every "Document" / "Datasource / DataSource" has a name currently but uniqueness is not enforced. Only if you want to get a "Document" / "Datasource / DataSource" with it's exact name I checked for exactly one search hit and throw an exception. I try to provide a useful name even when the content is coming from an URL or STDIN (and I will probably add environment variables as "Document" / "Datasource / DataSource", e.g configuration in the cloud as JSON content passed as environment variable) > > Is that this idea, that the a data source can be part of a group, and then > is also possibly identifiable with a name comes from an use case? I mean, > it's possibly important somewhere, but if so, then it's strange that you > can put something into only a single group. If we need this kind of thing, > then perhaps you should be just allowed to associate the data source with a > list of names (kind of like tagging), and then when the template wants to > get something by name, it will tell there if it expects exactly one or a > list of data sources. Then you don't need to introduce two terms in the > documentation either (names and groups). Again, if we want this at all, > instead of just going with a data source that itself gives a list. (And if > not, how will we handle a data source that loads from a non-file source?) I actually thought of implementing tagging but considered a "group" sufficient. * If you don't define anything everything goes into the "default" group * For individual documents you can define a name and an optional group I think we have a different understanding what a "Document" / "Datasource / DataSource" should do * It is a dumb * It is lazy since data is only loaded on demand * There is no automagic like "oh, this is a JSON file, so let's go to the JSON tool and create a map readily accessible in the data model" > > Note that the current command line syntax doesn't work well with shell > wildcard expansion. Like this: > --datasource :someGroup=logs/*.log > will try to expand ":someGroup=logs/*.log", and because it finds nothing > (and because the rules of sh and the like is a mess), you will get the > parameter value as is, without * expanded. The joy of programming - I did not intend to use "name:group" together with wildcards :-) > > Also, I think the syntax with colon should be flipped, because on other > places foo:bar usually means that foo is the bigger unit (the container), > and bar is the smaller unit (the child). I Disagree here - I think using a name would be used more often. I added the "group" as an afterthought since some grouping could be useful > > On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl < > [email protected]> wrote: > >> Hi Daniel, >> >> I'm an enterprise developer - bad habits die hard :-) >> >> So I closed the following tickets and merged the branches >> >> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into >> "freemarker-generator" >> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource" >> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names >> for datasources >> >> Thanks in advance, >> >> Siegfried Goeschl >> >> >>> On 29.02.2020, at 12:19, Daniel Dekany <[email protected]> wrote: >>> >>> Yeah, and of course, you can merge that branch. You can even work on the >>> master directly after all. >>> >>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <[email protected]> >>> wrote: >>> >>>> But, I do recognize the cattle use case (several "faceless" files with >>>> common format/schema). Only, my idea is to push that complexity on the >> data >>>> source. The "data source" concept shields the rest of the application >> from >>>> the details of how the data is stored or retrieved. So, a data source >> might >>>> loads a bunch of log files from a directory, and present them as a >> single >>>> big table, or like a list of tables, etc. So I want to deal with the >> cattle >>>> use case, but the question is what part of the of architecture will deal >>>> with this complication, with other words, how do you box things. Why my >>>> initial bet is to stuff that complication into the "data source" >>>> implementation(s) is that data sources are inherently varied. Some >> returns >>>> a table-like thing, some have multiple named tables (worksheets in >> Excel), >>>> some returns tree of nodes (XML), etc. So then, some might returns a >>>> list-of-list-of log records, or just a single list of log-records (put >>>> together from daily log files). That way cattles don't add to conceptual >>>> complexity. Now, you might be aware of cases where the cattle concept >> must >>>> be more exposed than this, and the we can't box things like this. But >> this >>>> is what I tried to express. >>>> >>>> Regarding "output generators", and how that applies on the command >> line. I >>>> think it's important that the common core between Maven and >> command-line is >>>> as fat as possible. Ideally, they are just two syntax to set up the same >>>> thing. Mostly at least. So, if you specify a template file to the CLI >>>> application, in a way so that it causes it to process that template to >>>> generate a single output, then there you have just defined an "output >>>> generator" (even if it wasn't explicitly called like that in the command >>>> line). If you specify 3 csv files to the CLI application, in a way so >> that >>>> it causes it to generate 3 output files, then you have just defined 3 >>>> "output generators" there (there's at least one template specified there >>>> too, but that wasn't an "output generator" itself, it was just an >> attribute >>>> of the 3 output generators). If you specify 1 template, and 3 csv >> files, in >>>> a way so that it will yield 4 output files (1 for the template, 3 for >> the >>>> csv-s), then you have defined 4 output generators there. If you have a >> data >>>> source that loads a list of 3 entities (say, 3 csv files, so it's a >> list of >>>> tables then), and you have 2 templates, and you tell the CLI to execute >>>> each template for each item in said data source, then you have just >> defined >>>> 6 "output generators". >>>> >>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl < >>>> [email protected]> wrote: >>>> >>>>> Hi Daniel, >>>>> >>>>> That all depends on your mental model and work you do, expectations, >>>>> experience :-) >>>>> >>>>> >>>>> __Document Handling__ >>>>> >>>>> *"But I think actually we have no good use case for list of documents >>>>> that's passed at once to a single template run, so, we can just ignore >>>>> that complication"* >>>>> >>>>> In my case that's not a complication but my daily business - I'm >>>>> regularly wading through access logs - yesterday probably a couple of >>>>> hundreds access logs across two staging sites to help tracking some >>>>> strange API gateway issues :-) >>>>> >>>>> My gut feeling is (borrowing from >>>>> >>>>> >> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313 >>>>> ) >>>>> >>>>> 1. You have a few lovely named documents / templates - `pets` >>>>> 2. You have tons of anonymous documents / templates to process - >>>>> `cattle` >>>>> 3. The "grey area" comes into play when mixing `pets & cattle` >>>>> >>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since >>>>> it is equally important and common. >>>>> >>>>> >>>>> __Template And Document Processing Modes__ >>>>> >>>>> IMHO it is important to answer the following question : "How many >>>>> outputs do you get when rendering 2 template and 3 datasources? Two, >>>>> Three or Six?" >>>>> >>>>> Your answer is influenced by your mental model / experience >>>>> >>>>> * When wading through tons of CSV files, access logs, etc. the answer >> is >>>>> "2" >>>>> * When doing source code generation the obvious answer is "6" >>>>> * Can't image a use case which results in "3" but I'm pretty sure we >>>>> will encounter one >>>>> >>>>> __Template and document mode probably shouldn't exist__ >>>>> >>>>> That's hard for me to fully understand - I definitely lack your >> insights >>>>> & experience writing such tools :-) >>>>> >>>>> Defining the `Output Generator` is the underlying model for the Maven >>>>> plugin (and probably FMPP). >>>>> >>>>> I'm not sure if this applies for command lines at least not in the way >> I >>>>> use them (or would like to use them) >>>>> >>>>> >>>>> Thanks in advance, >>>>> >>>>> Siegfried Goeschl >>>>> >>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`? >>>>> >>>>> >>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote: >>>>> >>>>>> Yeah, "data source" is surely a too popular name, but for reason. >>>>>> Anyone >>>>>> has other ideas? >>>>>> >>>>>> As of naming data sources and such. One thing I was wondering about >>>>>> back >>>>>> then is how to deal with list of documents given to a template, versus >>>>>> exactly 1 document given to a template. But I think actually we have >>>>>> no >>>>>> good use case for list of documents that's passed at once to a single >>>>>> template run, so, we can just ignore that complication. A document has >>>>>> a >>>>>> name, and that's always just a single document, not a collection, as >>>>>> far as >>>>>> the template is concerned. (We can have multiple documents per run, >>>>>> but >>>>>> those normally yield separate output generators, so it's still only >>>>>> one >>>>>> document per template.) However, we can have data source types >>>>>> (document >>>>>> types with old terminology) that collect together multiple data files. >>>>>> So >>>>>> then that complexity is encapsulated into the data source type, and >>>>>> doesn't >>>>>> complicate the overall architecture. That's another case when a data >>>>>> source >>>>>> is not just a file. Like maybe there's a data source type that loads >>>>>> all >>>>>> the CSV-s from a directory, into a single big table (I had such case), >>>>>> or >>>>>> even into a list of tables. Or, as I mentioned already, a data source >>>>>> is >>>>>> maybe an SQL query on a JDBC data source (and we got the first term >>>>>> clash... JDBC also call them data sources). >>>>>> >>>>>> Template and document mode probably shouldn't exist from user >>>>>> perspective >>>>>> either, at least not as a global option that must apply to everything >>>>>> in a >>>>>> run. They could just give the files that define the "output >>>>>> generators", >>>>>> and some of them will be templates, some of them are data files, in >>>>>> which >>>>>> case a template need to be associated with them (and there can be a >>>>>> couple >>>>>> of ways of doing that). And then again, there are the cases where you >>>>>> want >>>>>> to create one output generator per entity from some data source. >>>>>> >>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Daniel, >>>>>>> >>>>>>> See my comments below - and thanks for your patience and input :-) >>>>>>> >>>>>>> *Renaming Document To DataSource* >>>>>>> >>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation >>>>>>> and >>>>>>> its DataSource. >>>>>>> >>>>>>> *Template And Document Mode* >>>>>>> >>>>>>> Agreed - I think it is a valuable abstraction for the user but it is >>>>>>> not >>>>>>> an implementation concept :-) >>>>>>> >>>>>>> *Document Without Symbolic Names* >>>>>>> >>>>>>> Also agreed and it is going to change but I have not settled my mind >>>>>>> yet >>>>>>> what exactly to implement. >>>>>>> >>>>>>> Thanks in advance, >>>>>>> >>>>>>> Siegfried Goeschl >>>>>>> >>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote: >>>>>>> >>>>>>> A few quick thoughts on that: >>>>>>> >>>>>>> - We should replace the "document" term with something more speaking. >>>>>>> It >>>>>>> doesn't tell that it's some kind of input. Also, most of these inputs >>>>>>> aren't something that people typically call documents. Like a csv >>>>>>> file, or >>>>>>> a database table, which is not even a file (OK we don't support such >>>>>>> thing >>>>>>> at the moment). I think, maybe "data source" is a safe enough term. >>>>>>> (It >>>>>>> also rhymes with data model.) >>>>>>> - You have separate "template" and "document" "mode", that applies to >>>>>>> a >>>>>>> whole run. I think such specialization won't be helpful. We could >>>>>>> just say, >>>>>>> on the conceptual level at lest, that we need a set of "outputs >>>>>>> generators". An output generator is an object (in the API) that >>>>>>> specifies a >>>>>>> template, a data-model (where the data-model is possibly populated >>>>>>> with >>>>>>> "documents"), and an output "sink" (a file path, or stdout), and can >>>>>>> generate the output itself. A practical way of defining the output >>>>>>> generators in a CLI application is via a bunch of files, each >>>>>>> defining an >>>>>>> output generator. Some of those files is maybe a template (that you >>>>>>> can >>>>>>> even detect from the file extension), or a data file that we >>>>>>> currently call >>>>>>> a "document". They could freely mix inside the same run. I have also >>>>>>> met >>>>>>> use case when you have a single table (single "document"), and each >>>>>>> record >>>>>>> in it yields an output file. That can also be described in some file >>>>>>> format, or really in any other way, like directly in command line >>>>>>> argument, >>>>>>> via API, etc. >>>>>>> - You have multiple documents without associated symbolical name in >>>>>>> some >>>>>>> examples. Templates can't identify those then in a well maintainable >>>>>>> way. >>>>>>> The actual file name is often not a good identifier, can change over >>>>>>> time, >>>>>>> and you might don't even have good control over it, like you already >>>>>>> receive it as a parameter from somewhere else, or someone >>>>>>> moves/renames >>>>>>> that files that you need to read. Index is also not very good, but I >>>>>>> have >>>>>>> written about that earlier. >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>> Hi folks, >>>>>>> >>>>>>> still wrapping my side around but assembled some thoughts here - >>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 >>>>>>> >>>>>>> Thanks in advance, >>>>>>> >>>>>>> Siegfried Goeschl >>>>>>> >>>>>>> >>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]> wrote: >>>>>>> >>>>>>> What you are describing is more like the angle that FMPP took >>>>>>> initially, >>>>>>> where templates drive things, they generate the output for themselves >>>>>>> >>>>>>> (even >>>>>>> >>>>>>> multiple output files if they wish). By default output files name >>>>>>> (and >>>>>>> relative path) is deduced from template name. There was also a global >>>>>>> data-model, built in a configuration file (or equally, built via >>>>>>> command >>>>>>> line arguments, or both mixed), from which templates get whatever >>>>>>> data >>>>>>> >>>>>>> they >>>>>>> >>>>>>> are interested in. Take a look at the figures here: >>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was >>>>>>> >>>>>>> generalized >>>>>>> >>>>>>> a bit more, because you could add XML files at the same place where >>>>>>> you >>>>>>> have the templates, and then you could associate transform templates >>>>>>> to >>>>>>> >>>>>>> the >>>>>>> >>>>>>> XML files (based on path pattern and/or the XML document element). >>>>>>> Now >>>>>>> that's like what freemarker-generator had initially (data files drive >>>>>>> output, and the template is there to transform it). >>>>>>> >>>>>>> So I think the generic mental model would like this: >>>>>>> >>>>>>> 1. You got files that drive the process, let's call them *generator >>>>>>> files* for now. Usually, each generator file yields an output file >>>>>>> (but >>>>>>> maybe even multiple output files, as you might saw in the last >>>>>>> figure). >>>>>>> These generator files can be of many types, like XML, JSON, XLSX (as >>>>>>> >>>>>>> in the >>>>>>> >>>>>>> original freemarker-generator), and even templates (as is the norm in >>>>>>> FMPP). If the file is not a template, then you got a set of >>>>>>> transformer >>>>>>> templates (-t CLI option) in a separate directory, which can be >>>>>>> >>>>>>> associated >>>>>>> >>>>>>> with the generator files base on name patterns, and even based on >>>>>>> >>>>>>> content >>>>>>> >>>>>>> (schema usually). If the generator file is a template (so that's a >>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and >>>>>>> is >>>>>>> >>>>>>> not >>>>>>> >>>>>>> a template file specified after the "-t" option), then you just >>>>>>> Template.process(...) it, and it prints what the output will be. >>>>>>> 2. You also have a set of variables, the global data-model, that >>>>>>> contains commonly useful stuff, like what you now call parameters >>>>>>> (CLI >>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those >>>>>>> >>>>>>> data >>>>>>> >>>>>>> files aren't "generator files". Templates just use them if they need >>>>>>> >>>>>>> them. >>>>>>> >>>>>>> An important thing here is to reuse the same mechanism to read and >>>>>>> >>>>>>> parse >>>>>>> >>>>>>> those data files, which was used in templates when transforming >>>>>>> >>>>>>> generator >>>>>>> >>>>>>> files. So we need a common format for specifying how to load data >>>>>>> >>>>>>> files. >>>>>>> >>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more >>>>>>> declarative format. >>>>>>> >>>>>>> What I have described in the original post here was a less generic >>>>>>> form >>>>>>> >>>>>>> of >>>>>>> >>>>>>> this, as I tried to be true with the original approach. I though the >>>>>>> proposal will be drastic enough as it is... :) There, the "main" >>>>>>> document >>>>>>> is the "generator file" from point 1, the "-t" template is the >>>>>>> transform >>>>>>> template for the "main" document, and the other named documents >>>>>>> ("users", >>>>>>> "groups") is a poor man's shared data-model from point 2 (together >>>>>>> with >>>>>>> with -PName=value). >>>>>>> >>>>>>> There's further somewhat confusing thing to get right with the >>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though. >>>>>>> In >>>>>>> the model above, as per point 1, if you list multiple data files, >>>>>>> each >>>>>>> >>>>>>> will >>>>>>> >>>>>>> generate a separate output file. So, if you need take in a list of >>>>>>> files >>>>>>> >>>>>>> to >>>>>>> >>>>>>> transform it to a single output file (or at least with a single >>>>>>> transform >>>>>>> template execution), then you have to be explicit about that, as >>>>>>> that's >>>>>>> >>>>>>> not >>>>>>> >>>>>>> the default behavior anymore. But it's still absolutely possible. >>>>>>> Imagine >>>>>>> it as a "list of XLSX-es" is itself like a file format. You need some >>>>>>> CLI >>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't >>>>>>> be a >>>>>>> big deal. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>> Hi Daniel, >>>>>>> >>>>>>> Good timing - I was looking at a similar problem from different angle >>>>>>> yesterday (see below) >>>>>>> >>>>>>> Don't have enough time to answer your email in detail now - will do >>>>>>> that >>>>>>> tomorrow evening >>>>>>> >>>>>>> Thanks in advance, >>>>>>> >>>>>>> Siegfried Goeschl >>>>>>> >>>>>>> >>>>>>> ===. START >>>>>>> # FreeMarker CLI Improvement >>>>>>> ## Support Of Multiple Template Files >>>>>>> Currently we support the following combinations >>>>>>> >>>>>>> * Single template and no data files >>>>>>> * Single template and one or more data files >>>>>>> >>>>>>> But we can not support the following use case which is quite typical >>>>>>> in >>>>>>> the cloud >>>>>>> >>>>>>> __Convert multiple templates with a single data file, e.g copying a >>>>>>> directory of configuration files using a JSON configuration file__ >>>>>>> >>>>>>> ## Implementation notes >>>>>>> * When we copy a directory we can remove the `ftl`extension on the >>>>>>> fly >>>>>>> * We might need an `exclude` filter for the copy operation >>>>>>> * Initially resolve to a list of template files and process one after >>>>>>> another >>>>>>> * Need to calculate the output file location and extension >>>>>>> * We need to rename the existing command line parameters (see below) >>>>>>> * Do we need multiple include and exclude filter? >>>>>>> * Do we need file versus directory filters? >>>>>>> >>>>>>> ### Command Line Options >>>>>>> ``` >>>>>>> --input-encoding : Encoding of the documents >>>>>>> --output-encoding : Encoding of the rendered template >>>>>>> --template-encoding : Encoding of the template >>>>>>> --output : Output file or directory >>>>>>> --include-document : Include pattern for documents >>>>>>> --exclude-document : Exclude pattern for documents >>>>>>> --include-template: Include pattern for templates >>>>>>> --exclude-template : Exclude pattern for templates >>>>>>> ``` >>>>>>> >>>>>>> ### Command Line Examples >>>>>>> ```text >>>>>>> # Copy all FTL templates found in "ext/config" to the "/config" >>>>>>> >>>>>>> directory >>>>>>> >>>>>>> using the data from "config.json" >>>>>>> >>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config >>>>>>> >>>>>>> config.json >>>>>>> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl >>>>>>> >>>>>>> --output >>>>>>> >>>>>>> /config config.json >>>>>>> >>>>>>> # Bascically the same using a named document "configuration" >>>>>>> # It might make sense to expose "conf" directly in the FreeMarker >>>>>>> data >>>>>>> model >>>>>>> # It might make sens to allow URIs for loading documents >>>>>>> >>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d >>>>>>> >>>>>>> configuration=config.json >>>>>>> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl >>>>>>> >>>>>>> --output >>>>>>> >>>>>>> /config --document configuration=config.json >>>>>>> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl >>>>>>> >>>>>>> --output >>>>>>> >>>>>>> /config --document configuration=file:///config.json >>>>>>> >>>>>>> # Bascically the same using an environment variable as named document >>>>>>> >>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d >>>>>>> >>>>>>> configuration=env:///CONFIGURATION >>>>>>> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl >>>>>>> >>>>>>> --output >>>>>>> >>>>>>> /config --document configuration=env:///CONFIGURATION >>>>>>> ``` >>>>>>> === END >>>>>>> >>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote: >>>>>>> >>>>>>> Input documents is a fundamental concept in freemarker-generator, so >>>>>>> we >>>>>>> should think about that more, and probably refine/rework how it's >>>>>>> done. >>>>>>> >>>>>>> Currently it works like this, with CLI at least. >>>>>>> >>>>>>> freemarker-cli >>>>>>> -t access-report.ftl >>>>>>> somewhere/foo-access-log.csv >>>>>>> >>>>>>> Then in access-report.ftl you have to do something like this: >>>>>>> >>>>>>> <#assign doc = Documents.get(0)> >>>>>>> ... process doc here >>>>>>> >>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a >>>>>>> >>>>>>> funny >>>>>>> >>>>>>> chain of coincidences: It returned the string "D", then >>>>>>> >>>>>>> CSVTool.parse(...) >>>>>>> >>>>>>> happily parsed that to a table with the single column "D", and 0 >>>>>>> rows, >>>>>>> >>>>>>> and >>>>>>> >>>>>>> as there were 0 rows, the template didn't run into an error because >>>>>>> row.myExpectedColumn refers to a missing column either, so the >>>>>>> process >>>>>>> finished with success. (: Pretty unlucky for sure. The root was >>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we >>>>>>> will >>>>>>> >>>>>>> have >>>>>>> >>>>>>> to work on those too, but, different topic.) >>>>>>> >>>>>>> However, actually multiple input documents can be passed in: >>>>>>> >>>>>>> freemarker-cli >>>>>>> -t access-report.ftl >>>>>>> somewhere/foo-access-log.csv >>>>>>> somewhere/bar-access-log.csv >>>>>>> >>>>>>> Above template will still work, though then you ignored all but the >>>>>>> >>>>>>> first >>>>>>> >>>>>>> document. So if you expect any number of input documents, you >>>>>>> probably >>>>>>> >>>>>>> will >>>>>>> >>>>>>> have to do this: >>>>>>> >>>>>>> <#list Documents.list as doc> >>>>>>> ... process doc here >>>>>>> </#list> >>>>>>> >>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again, >>>>>>> >>>>>>> those >>>>>>> >>>>>>> we will work out in a different thread.) >>>>>>> >>>>>>> >>>>>>> So, what would be better, in my opinion. I start out from what I >>>>>>> think >>>>>>> >>>>>>> are >>>>>>> >>>>>>> the common uses cases, in decreasing order of frequency. Goal is to >>>>>>> >>>>>>> make >>>>>>> >>>>>>> those less error prone for the users, and simpler to express. >>>>>>> >>>>>>> USE CASE 1 >>>>>>> >>>>>>> You have exactly 1 input documents, which is therefore simply "the" >>>>>>> document in the mind of the user. This is probably the typical use >>>>>>> >>>>>>> case, >>>>>>> >>>>>>> but at least the use case users typically start out from when >>>>>>> starting >>>>>>> >>>>>>> the >>>>>>> >>>>>>> work. >>>>>>> >>>>>>> freemarker-cli >>>>>>> -t access-report.ftl >>>>>>> somewhere/foo-access-log.csv >>>>>>> >>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's >>>>>>> >>>>>>> error >>>>>>> >>>>>>> prone, because if the user passed in more than 1 documents (can even >>>>>>> >>>>>>> happen >>>>>>> >>>>>>> totally accidentally, like if the user was lazy and used a wildcard >>>>>>> >>>>>>> that >>>>>>> >>>>>>> the shell exploded), the template will silently ignore the rest of >>>>>>> the >>>>>>> documents, and the singe document processed will be practically >>>>>>> picked >>>>>>> randomly. The user might won't notice that and submits a bad report >>>>>>> or >>>>>>> >>>>>>> such. >>>>>>> >>>>>>> I think that in this use case the document should be simply referred >>>>>>> as >>>>>>> `Document` in the template. When you have multiple documents there, >>>>>>> referring to `Document` should be an error, saying that the template >>>>>>> >>>>>>> was >>>>>>> >>>>>>> made to process a single document only. >>>>>>> >>>>>>> >>>>>>> USE CASE 2 >>>>>>> >>>>>>> You have multiple input documents, but each has different role >>>>>>> >>>>>>> (different >>>>>>> >>>>>>> schema, maybe different file type). Like, you pass in users.csv and >>>>>>> groups.csv. Each has difference schema, and so you want to access >>>>>>> them >>>>>>> differently, but in the same template. >>>>>>> >>>>>>> freemarker-cli >>>>>>> [...] >>>>>>> --named-document users somewhere/foo-users.csv >>>>>>> --named-document groups somewhere/foo-groups.csv >>>>>>> >>>>>>> Then in the template you could refer to them as: >>>>>>> >>>>>>> `NamedDocuments.users`, >>>>>>> >>>>>>> and `NamedDocuments.groups`. >>>>>>> >>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where >>>>>>> >>>>>>> `Document` >>>>>>> >>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main" >>>>>>> >>>>>>> because >>>>>>> >>>>>>> that's "the" document the template is about, but then you have to >>>>>>> added >>>>>>> some helper documents, with symbolic names representing their role. >>>>>>> >>>>>>> freemarker-cli >>>>>>> -t access-report.ftl >>>>>>> --document-name=main somewhere/foo-access-log.csv >>>>>>> --document-name=users somewhere/foo-users.csv >>>>>>> --document-name=groups somewhere/foo-groups.csv >>>>>>> >>>>>>> Here, `Document` still works in the template, and it refers to >>>>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main >>>>>>> >>>>>>> above >>>>>>> >>>>>>> would be cleaner, I couldn't figure out how to do that with Picocli. >>>>>>> Anyway, for now the point is the concept, which is not specific to >>>>>>> >>>>>>> CLI.) >>>>>>> >>>>>>> USE CASE 3 >>>>>>> >>>>>>> Here you have several of the same kind of documents. That has a more >>>>>>> generic sub-use-case, when you have explicitly named documents (like >>>>>>> "users" above), and for some you expect multiple input files. >>>>>>> >>>>>>> freemarker-cli >>>>>>> -t access-report.ftl >>>>>>> --document-name=main somewhere/foo-access-log.csv >>>>>>> somewhere/bar-access-log.csv >>>>>>> --document-name=users somewhere/foo-users.csv >>>>>>> somewhere/bar-users.csv >>>>>>> --document-name=groups somewhere/global-groups.csv >>>>>>> >>>>>>> The template must to be written with this use case in mind, as now it >>>>>>> >>>>>>> has >>>>>>> >>>>>>> #list some of the documents. (I think in practice you hardly ever >>>>>>> want >>>>>>> >>>>>>> to >>>>>>> >>>>>>> get a document by hard coded index. Either you don't know how many >>>>>>> documents you have, so you can't use hard coded indexes, or you do, >>>>>>> and >>>>>>> each index has a specific meaning, but then you should name the >>>>>>> >>>>>>> documents >>>>>>> >>>>>>> instead, as using indexes is error prone, and hard to read.) >>>>>>> Accessing that list of documents in the template, maybe could be done >>>>>>> >>>>>>> like >>>>>>> >>>>>>> this: >>>>>>> - For the "main" documents: `DocumentList` >>>>>>> - For explicitly named documents, like "users": >>>>>>> >>>>>>> `NamedDocumentLists.users` >>>>>>> >>>>>>> SUMMING UP >>>>>>> >>>>>>> To unify all 3 use cases into a coherent concept: >>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you >>>>>>> >>>>>>> can >>>>>>> >>>>>>> achieve everything with it, using it requires your template to handle >>>>>>> >>>>>>> the >>>>>>> >>>>>>> most generic case too. So, I think it would be rarely used. >>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. >>>>>>> >>>>>>> It's >>>>>>> >>>>>>> used if you only have one kind of documents (single format and >>>>>>> schema), >>>>>>> >>>>>>> but >>>>>>> >>>>>>> potentially multiple of them. >>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1 >>>>>>> document >>>>>>> >>>>>>> of >>>>>>> >>>>>>> the given name. >>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is >>>>>>> for >>>>>>> >>>>>>> the >>>>>>> >>>>>>> most natural/frequent use case. >>>>>>> >>>>>>> That's 4 possible ways of accessing your documents, which is a >>>>>>> >>>>>>> trade-off >>>>>>> >>>>>>> for the sake of these: >>>>>>> - Catching CLI (or Maven, etc.) input where the template output >>>>>>> likely >>>>>>> >>>>>>> will >>>>>>> >>>>>>> be wrong. That's only possible if the user can communicate its intent >>>>>>> >>>>>>> in >>>>>>> >>>>>>> the template. >>>>>>> - Users don't need to deal with concepts that are irrelevant in their >>>>>>> concrete use case. Just start with the trivial, `Document`, and later >>>>>>> >>>>>>> if >>>>>>> >>>>>>> the need arises, generalize to named documents, document lists, or >>>>>>> >>>>>>> both. >>>>>>> >>>>>>> What do guys think? >>>>>>> >>>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Daniel Dekany >>>> >>> >>> >>> -- >>> Best regards, >>> Daniel Dekany >> >> > > -- > Best regards, > Daniel Dekany
