mbeckerle commented on code in PR #90: URL: https://github.com/apache/daffodil-site/pull/90#discussion_r889129286
########## site/packagingSchemas.adoc: ########## @@ -0,0 +1,84 @@ +:page-layout: page +:url-asciidoctor: http://asciidoctor.org +:keywords: schema package jar +// /////////////////////////////////////////////////////////////////////////// +// +// This file is written in AsciiDoc. +// +// If you can read this comment, your browser is not rendering asciidoc automatically. +// +// You need to install the asciidoc plugin to Chrome or Firefox +// so that this page will be properly rendered for your viewing pleasure. +// +// You can get the plugins by searching the web for 'asciidoc plugin' +// +// You will want to change plugin settings to enable diagrams (they're off by default.) +// +// You need to view this page with Chrome or Firefox. +// +// /////////////////////////////////////////////////////////////////////////// +// +// When editing, please start each sentence on a new line. +// See https://asciidoctor.org/docs/asciidoc-recommended-practices/#one-sentence-per-line[one sentence-per-line writing technique.] +// This makes textual diffs of this file useful in a similar way to the way they work for code. +// +// ////////////////////////////////////////////////////////////////////////// + += Packaging DFDL Schemas for use in Daffodil Applications + +=== Advance Summary + +- The best way to use DFDL schemas is accessing them from Jar files +- Include pre-compiled binary DFDL schema files also in the same Jar file. +- Include any Daffodil plugins (class files for compiled scala/java code) required by the DFDL schema also in the same Jar file (with the appropriate META-INF files) and optionally with the source code for the plugins. +- Create _glue_ DFDL schemas that combine other DFDL schemas using managed dependencies (e.g., maven/sbt) on the Jar files of the dependency DFDL schemas. +- Managed dependencies can be used to obtain specific versions of DFDL schemas for applications in the same way that applications obtain and depend upon Java libraries. +- Digital signatures (signed jars) can enhance security by providing trust in the creator of the packaged DFDL schema jar. +- Standard sbt tools facilitate all of this. + + +=== Introduction to DFDL Schema Packaging + +DFDL schemas can be large collections of files. +There are DFDL schemas with over 100 files spread over numerous directories. + +The organization of the files into these directory structures is not arbitrary. +It can be needed to avoid file name clashes and serves the same role as the Java package-name directory structure does for Java programs. +The directory hierarchy defines a Java package-like namespace structure for DFDL schemas. Review Comment: Well, this is tricky. The reason to flatten the hierarchy was primarily for training - people working from command-line tools were struggling with the file tree depth for simple examples. Too much typing of ' cd src/test/org/foo/bar/baz' to get to source then the opposed path to get back to src/main/... etc. Really that's why we did the flattening. For larger schemas I think the hierarchy is still needed. But... Daffodil has it's classpath-oriented resolver so that schemas can reference includes/imports from places found via classpath search. But if a DFDL schema is used as an XSD, to validate XML separately from the parse, and this is done by some other XML tool, then that tool may need to use a similar classpath-oriented resolver to find the various pieces of the schema, based on the schemaLocation attributes in the DFDL schema import/include statements. That is if they want to directly pull the schema files from jars. For Java-based XML tools we could perhaps specify they use exactly the Daffodil resolver. (There's actually some centralized Apache resolver project (part of XML commons?) intended to address exactly this weak spot in the w3c XML specs about how schema locations work. Possibly what we're doing in the Daffodil resolver with classpath search should become a part of that effort, or the relationship of the two things should be explored anyway.) But for other technology bases than Java, well XML Catalogs are also a possibility, but we don't really test that, and there's even a JIRA ticket suggesting we deprecate catalog support entirely. Hence, we need the "un-jar" technique to also work: "un-zip all the jars on top of the same directory tree", knowing that the directory structure insures no name collisions. I believe if the XML processing always resolves schemaLocations relative to the root of that tree, then all schema files would be resolvable. This is what we hope to be the lowest-common-denominator for schemaLocation resolvers. I think we need to document this un-jar technique. But as you know it gets worse because in some applications, people have to break up the set of files further. E.g., in cybersecurity people want each data flow to have exactly and only the DFDL schema files it needs as part of that flow's configurations, and commonly a particular flow allows only a subset of all the things in the format. For now I believe this reorganization of a schema also has to be done by hand. But we need to document how, and then if automated tools that chase the include/imports can be created so as to help automate this, that would be an improvement. I don't like the notion of the DFDL schema having to cater to the needs of the application too much. It makes reuse of the schema harder for other applications such as data integration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
