Yeah, and of course, you can merge that branch. You can even work on the master directly after all.
On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <[email protected]> wrote: > But, I do recognize the cattle use case (several "faceless" files with > common format/schema). Only, my idea is to push that complexity on the data > source. The "data source" concept shields the rest of the application from > the details of how the data is stored or retrieved. So, a data source might > loads a bunch of log files from a directory, and present them as a single > big table, or like a list of tables, etc. So I want to deal with the cattle > use case, but the question is what part of the of architecture will deal > with this complication, with other words, how do you box things. Why my > initial bet is to stuff that complication into the "data source" > implementation(s) is that data sources are inherently varied. Some returns > a table-like thing, some have multiple named tables (worksheets in Excel), > some returns tree of nodes (XML), etc. So then, some might returns a > list-of-list-of log records, or just a single list of log-records (put > together from daily log files). That way cattles don't add to conceptual > complexity. Now, you might be aware of cases where the cattle concept must > be more exposed than this, and the we can't box things like this. But this > is what I tried to express. > > Regarding "output generators", and how that applies on the command line. I > think it's important that the common core between Maven and command-line is > as fat as possible. Ideally, they are just two syntax to set up the same > thing. Mostly at least. So, if you specify a template file to the CLI > application, in a way so that it causes it to process that template to > generate a single output, then there you have just defined an "output > generator" (even if it wasn't explicitly called like that in the command > line). If you specify 3 csv files to the CLI application, in a way so that > it causes it to generate 3 output files, then you have just defined 3 > "output generators" there (there's at least one template specified there > too, but that wasn't an "output generator" itself, it was just an attribute > of the 3 output generators). If you specify 1 template, and 3 csv files, in > a way so that it will yield 4 output files (1 for the template, 3 for the > csv-s), then you have defined 4 output generators there. If you have a data > source that loads a list of 3 entities (say, 3 csv files, so it's a list of > tables then), and you have 2 templates, and you tell the CLI to execute > each template for each item in said data source, then you have just defined > 6 "output generators". > > On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl < > [email protected]> wrote: > >> Hi Daniel, >> >> That all depends on your mental model and work you do, expectations, >> experience :-) >> >> >> __Document Handling__ >> >> *"But I think actually we have no good use case for list of documents >> that's passed at once to a single template run, so, we can just ignore >> that complication"* >> >> In my case that's not a complication but my daily business - I'm >> regularly wading through access logs - yesterday probably a couple of >> hundreds access logs across two staging sites to help tracking some >> strange API gateway issues :-) >> >> My gut feeling is (borrowing from >> >> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313 >> ) >> >> 1. You have a few lovely named documents / templates - `pets` >> 2. You have tons of anonymous documents / templates to process - >> `cattle` >> 3. The "grey area" comes into play when mixing `pets & cattle` >> >> `freemarker-cli` was built with 2) in mind and I want to cover 1) since >> it is equally important and common. >> >> >> __Template And Document Processing Modes__ >> >> IMHO it is important to answer the following question : "How many >> outputs do you get when rendering 2 template and 3 datasources? Two, >> Three or Six?" >> >> Your answer is influenced by your mental model / experience >> >> * When wading through tons of CSV files, access logs, etc. the answer is >> "2" >> * When doing source code generation the obvious answer is "6" >> * Can't image a use case which results in "3" but I'm pretty sure we >> will encounter one >> >> __Template and document mode probably shouldn't exist__ >> >> That's hard for me to fully understand - I definitely lack your insights >> & experience writing such tools :-) >> >> Defining the `Output Generator` is the underlying model for the Maven >> plugin (and probably FMPP). >> >> I'm not sure if this applies for command lines at least not in the way I >> use them (or would like to use them) >> >> >> Thanks in advance, >> >> Siegfried Goeschl >> >> PS: Can/shall I merge the PR to bring in `freemarker-cli`? >> >> >> On 28 Feb 2020, at 9:14, Daniel Dekany wrote: >> >> > Yeah, "data source" is surely a too popular name, but for reason. >> > Anyone >> > has other ideas? >> > >> > As of naming data sources and such. One thing I was wondering about >> > back >> > then is how to deal with list of documents given to a template, versus >> > exactly 1 document given to a template. But I think actually we have >> > no >> > good use case for list of documents that's passed at once to a single >> > template run, so, we can just ignore that complication. A document has >> > a >> > name, and that's always just a single document, not a collection, as >> > far as >> > the template is concerned. (We can have multiple documents per run, >> > but >> > those normally yield separate output generators, so it's still only >> > one >> > document per template.) However, we can have data source types >> > (document >> > types with old terminology) that collect together multiple data files. >> > So >> > then that complexity is encapsulated into the data source type, and >> > doesn't >> > complicate the overall architecture. That's another case when a data >> > source >> > is not just a file. Like maybe there's a data source type that loads >> > all >> > the CSV-s from a directory, into a single big table (I had such case), >> > or >> > even into a list of tables. Or, as I mentioned already, a data source >> > is >> > maybe an SQL query on a JDBC data source (and we got the first term >> > clash... JDBC also call them data sources). >> > >> > Template and document mode probably shouldn't exist from user >> > perspective >> > either, at least not as a global option that must apply to everything >> > in a >> > run. They could just give the files that define the "output >> > generators", >> > and some of them will be templates, some of them are data files, in >> > which >> > case a template need to be associated with them (and there can be a >> > couple >> > of ways of doing that). And then again, there are the cases where you >> > want >> > to create one output generator per entity from some data source. >> > >> > On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl < >> > [email protected]> wrote: >> > >> >> Hi Daniel, >> >> >> >> See my comments below - and thanks for your patience and input :-) >> >> >> >> *Renaming Document To DataSource* >> >> >> >> Yes, makes sense. I tried to avoid since I'm using javax.activation >> >> and >> >> its DataSource. >> >> >> >> *Template And Document Mode* >> >> >> >> Agreed - I think it is a valuable abstraction for the user but it is >> >> not >> >> an implementation concept :-) >> >> >> >> *Document Without Symbolic Names* >> >> >> >> Also agreed and it is going to change but I have not settled my mind >> >> yet >> >> what exactly to implement. >> >> >> >> Thanks in advance, >> >> >> >> Siegfried Goeschl >> >> >> >> On 28 Feb 2020, at 1:05, Daniel Dekany wrote: >> >> >> >> A few quick thoughts on that: >> >> >> >> - We should replace the "document" term with something more speaking. >> >> It >> >> doesn't tell that it's some kind of input. Also, most of these inputs >> >> aren't something that people typically call documents. Like a csv >> >> file, or >> >> a database table, which is not even a file (OK we don't support such >> >> thing >> >> at the moment). I think, maybe "data source" is a safe enough term. >> >> (It >> >> also rhymes with data model.) >> >> - You have separate "template" and "document" "mode", that applies to >> >> a >> >> whole run. I think such specialization won't be helpful. We could >> >> just say, >> >> on the conceptual level at lest, that we need a set of "outputs >> >> generators". An output generator is an object (in the API) that >> >> specifies a >> >> template, a data-model (where the data-model is possibly populated >> >> with >> >> "documents"), and an output "sink" (a file path, or stdout), and can >> >> generate the output itself. A practical way of defining the output >> >> generators in a CLI application is via a bunch of files, each >> >> defining an >> >> output generator. Some of those files is maybe a template (that you >> >> can >> >> even detect from the file extension), or a data file that we >> >> currently call >> >> a "document". They could freely mix inside the same run. I have also >> >> met >> >> use case when you have a single table (single "document"), and each >> >> record >> >> in it yields an output file. That can also be described in some file >> >> format, or really in any other way, like directly in command line >> >> argument, >> >> via API, etc. >> >> - You have multiple documents without associated symbolical name in >> >> some >> >> examples. Templates can't identify those then in a well maintainable >> >> way. >> >> The actual file name is often not a good identifier, can change over >> >> time, >> >> and you might don't even have good control over it, like you already >> >> receive it as a parameter from somewhere else, or someone >> >> moves/renames >> >> that files that you need to read. Index is also not very good, but I >> >> have >> >> written about that earlier. >> >> >> >> >> >> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl < >> >> [email protected]> wrote: >> >> >> >> Hi folks, >> >> >> >> still wrapping my side around but assembled some thoughts here - >> >> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 >> >> >> >> Thanks in advance, >> >> >> >> Siegfried Goeschl >> >> >> >> >> >> On 23 Feb 2020, at 23:14, Daniel Dekany <[email protected]> wrote: >> >> >> >> What you are describing is more like the angle that FMPP took >> >> initially, >> >> where templates drive things, they generate the output for themselves >> >> >> >> (even >> >> >> >> multiple output files if they wish). By default output files name >> >> (and >> >> relative path) is deduced from template name. There was also a global >> >> data-model, built in a configuration file (or equally, built via >> >> command >> >> line arguments, or both mixed), from which templates get whatever >> >> data >> >> >> >> they >> >> >> >> are interested in. Take a look at the figures here: >> >> http://fmpp.sourceforge.net/qtour.html. Later, this concept was >> >> >> >> generalized >> >> >> >> a bit more, because you could add XML files at the same place where >> >> you >> >> have the templates, and then you could associate transform templates >> >> to >> >> >> >> the >> >> >> >> XML files (based on path pattern and/or the XML document element). >> >> Now >> >> that's like what freemarker-generator had initially (data files drive >> >> output, and the template is there to transform it). >> >> >> >> So I think the generic mental model would like this: >> >> >> >> 1. You got files that drive the process, let's call them *generator >> >> files* for now. Usually, each generator file yields an output file >> >> (but >> >> maybe even multiple output files, as you might saw in the last >> >> figure). >> >> These generator files can be of many types, like XML, JSON, XLSX (as >> >> >> >> in the >> >> >> >> original freemarker-generator), and even templates (as is the norm in >> >> FMPP). If the file is not a template, then you got a set of >> >> transformer >> >> templates (-t CLI option) in a separate directory, which can be >> >> >> >> associated >> >> >> >> with the generator files base on name patterns, and even based on >> >> >> >> content >> >> >> >> (schema usually). If the generator file is a template (so that's a >> >> positional @Parameter CLI argument that happens to be an *.ftl, and >> >> is >> >> >> >> not >> >> >> >> a template file specified after the "-t" option), then you just >> >> Template.process(...) it, and it prints what the output will be. >> >> 2. You also have a set of variables, the global data-model, that >> >> contains commonly useful stuff, like what you now call parameters >> >> (CLI >> >> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those >> >> >> >> data >> >> >> >> files aren't "generator files". Templates just use them if they need >> >> >> >> them. >> >> >> >> An important thing here is to reuse the same mechanism to read and >> >> >> >> parse >> >> >> >> those data files, which was used in templates when transforming >> >> >> >> generator >> >> >> >> files. So we need a common format for specifying how to load data >> >> >> >> files. >> >> >> >> That's maybe just FTL that #assigns to the variables, or maybe more >> >> declarative format. >> >> >> >> What I have described in the original post here was a less generic >> >> form >> >> >> >> of >> >> >> >> this, as I tried to be true with the original approach. I though the >> >> proposal will be drastic enough as it is... :) There, the "main" >> >> document >> >> is the "generator file" from point 1, the "-t" template is the >> >> transform >> >> template for the "main" document, and the other named documents >> >> ("users", >> >> "groups") is a poor man's shared data-model from point 2 (together >> >> with >> >> with -PName=value). >> >> >> >> There's further somewhat confusing thing to get right with the >> >> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though. >> >> In >> >> the model above, as per point 1, if you list multiple data files, >> >> each >> >> >> >> will >> >> >> >> generate a separate output file. So, if you need take in a list of >> >> files >> >> >> >> to >> >> >> >> transform it to a single output file (or at least with a single >> >> transform >> >> template execution), then you have to be explicit about that, as >> >> that's >> >> >> >> not >> >> >> >> the default behavior anymore. But it's still absolutely possible. >> >> Imagine >> >> it as a "list of XLSX-es" is itself like a file format. You need some >> >> CLI >> >> (and Maven config, etc.) syntax to express that, but that shouldn't >> >> be a >> >> big deal. >> >> >> >> >> >> >> >> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl < >> >> [email protected]> wrote: >> >> >> >> Hi Daniel, >> >> >> >> Good timing - I was looking at a similar problem from different angle >> >> yesterday (see below) >> >> >> >> Don't have enough time to answer your email in detail now - will do >> >> that >> >> tomorrow evening >> >> >> >> Thanks in advance, >> >> >> >> Siegfried Goeschl >> >> >> >> >> >> ===. START >> >> # FreeMarker CLI Improvement >> >> ## Support Of Multiple Template Files >> >> Currently we support the following combinations >> >> >> >> * Single template and no data files >> >> * Single template and one or more data files >> >> >> >> But we can not support the following use case which is quite typical >> >> in >> >> the cloud >> >> >> >> __Convert multiple templates with a single data file, e.g copying a >> >> directory of configuration files using a JSON configuration file__ >> >> >> >> ## Implementation notes >> >> * When we copy a directory we can remove the `ftl`extension on the >> >> fly >> >> * We might need an `exclude` filter for the copy operation >> >> * Initially resolve to a list of template files and process one after >> >> another >> >> * Need to calculate the output file location and extension >> >> * We need to rename the existing command line parameters (see below) >> >> * Do we need multiple include and exclude filter? >> >> * Do we need file versus directory filters? >> >> >> >> ### Command Line Options >> >> ``` >> >> --input-encoding : Encoding of the documents >> >> --output-encoding : Encoding of the rendered template >> >> --template-encoding : Encoding of the template >> >> --output : Output file or directory >> >> --include-document : Include pattern for documents >> >> --exclude-document : Exclude pattern for documents >> >> --include-template: Include pattern for templates >> >> --exclude-template : Exclude pattern for templates >> >> ``` >> >> >> >> ### Command Line Examples >> >> ```text >> >> # Copy all FTL templates found in "ext/config" to the "/config" >> >> >> >> directory >> >> >> >> using the data from "config.json" >> >> >> >> freemarker-cli -t ./ext/config --include-template *.ftl --o /config >> >> >> >> config.json >> >> >> >> freemarker-cli --template ./ext/config --include-template *.ftl >> >> >> >> --output >> >> >> >> /config config.json >> >> >> >> # Bascically the same using a named document "configuration" >> >> # It might make sense to expose "conf" directly in the FreeMarker >> >> data >> >> model >> >> # It might make sens to allow URIs for loading documents >> >> >> >> freemarker-cli -t ./ext/config/*.ftl -o /config -d >> >> >> >> configuration=config.json >> >> >> >> freemarker-cli --template ./ext/config --include-template *.ftl >> >> >> >> --output >> >> >> >> /config --document configuration=config.json >> >> >> >> freemarker-cli --template ./ext/config --include-template *.ftl >> >> >> >> --output >> >> >> >> /config --document configuration=file:///config.json >> >> >> >> # Bascically the same using an environment variable as named document >> >> >> >> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d >> >> >> >> configuration=env:///CONFIGURATION >> >> >> >> freemarker-cli --template ./ext/config --include-template *.ftl >> >> >> >> --output >> >> >> >> /config --document configuration=env:///CONFIGURATION >> >> ``` >> >> === END >> >> >> >> On 23.02.2020, at 16:37, Daniel Dekany <[email protected]> wrote: >> >> >> >> Input documents is a fundamental concept in freemarker-generator, so >> >> we >> >> should think about that more, and probably refine/rework how it's >> >> done. >> >> >> >> Currently it works like this, with CLI at least. >> >> >> >> freemarker-cli >> >> -t access-report.ftl >> >> somewhere/foo-access-log.csv >> >> >> >> Then in access-report.ftl you have to do something like this: >> >> >> >> <#assign doc = Documents.get(0)> >> >> ... process doc here >> >> >> >> (The more idiomatic Documents[0] won't work. Actually, that lead to a >> >> >> >> funny >> >> >> >> chain of coincidences: It returned the string "D", then >> >> >> >> CSVTool.parse(...) >> >> >> >> happily parsed that to a table with the single column "D", and 0 >> >> rows, >> >> >> >> and >> >> >> >> as there were 0 rows, the template didn't run into an error because >> >> row.myExpectedColumn refers to a missing column either, so the >> >> process >> >> finished with success. (: Pretty unlucky for sure. The root was >> >> unintentionally breaking a FreeMarker idiom though; eventually we >> >> will >> >> >> >> have >> >> >> >> to work on those too, but, different topic.) >> >> >> >> However, actually multiple input documents can be passed in: >> >> >> >> freemarker-cli >> >> -t access-report.ftl >> >> somewhere/foo-access-log.csv >> >> somewhere/bar-access-log.csv >> >> >> >> Above template will still work, though then you ignored all but the >> >> >> >> first >> >> >> >> document. So if you expect any number of input documents, you >> >> probably >> >> >> >> will >> >> >> >> have to do this: >> >> >> >> <#list Documents.list as doc> >> >> ... process doc here >> >> </#list> >> >> >> >> (The more idiomatic <#list Documents as doc> won't work; but again, >> >> >> >> those >> >> >> >> we will work out in a different thread.) >> >> >> >> >> >> So, what would be better, in my opinion. I start out from what I >> >> think >> >> >> >> are >> >> >> >> the common uses cases, in decreasing order of frequency. Goal is to >> >> >> >> make >> >> >> >> those less error prone for the users, and simpler to express. >> >> >> >> USE CASE 1 >> >> >> >> You have exactly 1 input documents, which is therefore simply "the" >> >> document in the mind of the user. This is probably the typical use >> >> >> >> case, >> >> >> >> but at least the use case users typically start out from when >> >> starting >> >> >> >> the >> >> >> >> work. >> >> >> >> freemarker-cli >> >> -t access-report.ftl >> >> somewhere/foo-access-log.csv >> >> >> >> Then `Documents.get(0)` is not very fitting. Most importantly it's >> >> >> >> error >> >> >> >> prone, because if the user passed in more than 1 documents (can even >> >> >> >> happen >> >> >> >> totally accidentally, like if the user was lazy and used a wildcard >> >> >> >> that >> >> >> >> the shell exploded), the template will silently ignore the rest of >> >> the >> >> documents, and the singe document processed will be practically >> >> picked >> >> randomly. The user might won't notice that and submits a bad report >> >> or >> >> >> >> such. >> >> >> >> I think that in this use case the document should be simply referred >> >> as >> >> `Document` in the template. When you have multiple documents there, >> >> referring to `Document` should be an error, saying that the template >> >> >> >> was >> >> >> >> made to process a single document only. >> >> >> >> >> >> USE CASE 2 >> >> >> >> You have multiple input documents, but each has different role >> >> >> >> (different >> >> >> >> schema, maybe different file type). Like, you pass in users.csv and >> >> groups.csv. Each has difference schema, and so you want to access >> >> them >> >> differently, but in the same template. >> >> >> >> freemarker-cli >> >> [...] >> >> --named-document users somewhere/foo-users.csv >> >> --named-document groups somewhere/foo-groups.csv >> >> >> >> Then in the template you could refer to them as: >> >> >> >> `NamedDocuments.users`, >> >> >> >> and `NamedDocuments.groups`. >> >> >> >> Use Case 1, and 2 can be unified into a coherent concept, where >> >> >> >> `Document` >> >> >> >> is just a shorthand for `NamedDocuments.main`. It's called "main" >> >> >> >> because >> >> >> >> that's "the" document the template is about, but then you have to >> >> added >> >> some helper documents, with symbolic names representing their role. >> >> >> >> freemarker-cli >> >> -t access-report.ftl >> >> --document-name=main somewhere/foo-access-log.csv >> >> --document-name=users somewhere/foo-users.csv >> >> --document-name=groups somewhere/foo-groups.csv >> >> >> >> Here, `Document` still works in the template, and it refers to >> >> `somewhere/foo-access-log.csv`. (While omitting --document-name=main >> >> >> >> above >> >> >> >> would be cleaner, I couldn't figure out how to do that with Picocli. >> >> Anyway, for now the point is the concept, which is not specific to >> >> >> >> CLI.) >> >> >> >> USE CASE 3 >> >> >> >> Here you have several of the same kind of documents. That has a more >> >> generic sub-use-case, when you have explicitly named documents (like >> >> "users" above), and for some you expect multiple input files. >> >> >> >> freemarker-cli >> >> -t access-report.ftl >> >> --document-name=main somewhere/foo-access-log.csv >> >> somewhere/bar-access-log.csv >> >> --document-name=users somewhere/foo-users.csv >> >> somewhere/bar-users.csv >> >> --document-name=groups somewhere/global-groups.csv >> >> >> >> The template must to be written with this use case in mind, as now it >> >> >> >> has >> >> >> >> #list some of the documents. (I think in practice you hardly ever >> >> want >> >> >> >> to >> >> >> >> get a document by hard coded index. Either you don't know how many >> >> documents you have, so you can't use hard coded indexes, or you do, >> >> and >> >> each index has a specific meaning, but then you should name the >> >> >> >> documents >> >> >> >> instead, as using indexes is error prone, and hard to read.) >> >> Accessing that list of documents in the template, maybe could be done >> >> >> >> like >> >> >> >> this: >> >> - For the "main" documents: `DocumentList` >> >> - For explicitly named documents, like "users": >> >> >> >> `NamedDocumentLists.users` >> >> >> >> SUMMING UP >> >> >> >> To unify all 3 use cases into a coherent concept: >> >> - `NamedDocumentLists.<name>` is the most generic form, and while you >> >> >> >> can >> >> >> >> achieve everything with it, using it requires your template to handle >> >> >> >> the >> >> >> >> most generic case too. So, I think it would be rarely used. >> >> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. >> >> >> >> It's >> >> >> >> used if you only have one kind of documents (single format and >> >> schema), >> >> >> >> but >> >> >> >> potentially multiple of them. >> >> - `NamedDocuments.<name>` expresses that you expect exactly 1 >> >> document >> >> >> >> of >> >> >> >> the given name. >> >> - `Document` is just a shorthand for `NamedDocuments.main`. This is >> >> for >> >> >> >> the >> >> >> >> most natural/frequent use case. >> >> >> >> That's 4 possible ways of accessing your documents, which is a >> >> >> >> trade-off >> >> >> >> for the sake of these: >> >> - Catching CLI (or Maven, etc.) input where the template output >> >> likely >> >> >> >> will >> >> >> >> be wrong. That's only possible if the user can communicate its intent >> >> >> >> in >> >> >> >> the template. >> >> - Users don't need to deal with concepts that are irrelevant in their >> >> concrete use case. Just start with the trivial, `Document`, and later >> >> >> >> if >> >> >> >> the need arises, generalize to named documents, document lists, or >> >> >> >> both. >> >> >> >> What do guys think? >> >> >> >> >> > > > -- > Best regards, > Daniel Dekany > -- Best regards, Daniel Dekany
