mbeckerle commented on code in PR #90:
URL: https://github.com/apache/daffodil-site/pull/90#discussion_r889129286


##########
site/packagingSchemas.adoc:
##########
@@ -0,0 +1,84 @@
+:page-layout: page
+:url-asciidoctor: http://asciidoctor.org
+:keywords: schema package jar
+// ///////////////////////////////////////////////////////////////////////////
+//
+// This file is written in AsciiDoc.
+//
+// If you can read this comment, your browser is not rendering asciidoc 
automatically.
+//
+// You need to install the asciidoc plugin to Chrome or Firefox
+// so that this page will be properly rendered for your viewing pleasure.
+//
+// You can get the plugins by searching the web for 'asciidoc plugin'
+//
+// You will want to change plugin settings to enable diagrams (they're off by 
default.)
+//
+// You need to view this page with Chrome or Firefox.
+//
+// ///////////////////////////////////////////////////////////////////////////
+//
+// When editing, please start each sentence on a new line.
+// See 
https://asciidoctor.org/docs/asciidoc-recommended-practices/#one-sentence-per-line[one
 sentence-per-line writing technique.]
+// This makes textual diffs of this file useful in a similar way to the way 
they work for code.
+//
+// //////////////////////////////////////////////////////////////////////////
+
+= Packaging DFDL Schemas for use in Daffodil Applications
+
+=== Advance Summary
+
+- The best way to use DFDL schemas is accessing them from Jar files
+- Include pre-compiled binary DFDL schema files also in the same Jar file.
+- Include any Daffodil plugins (class files for compiled scala/java code) 
required by the DFDL schema also in the same Jar file (with the appropriate 
META-INF files) and optionally with the source code for the plugins.
+- Create _glue_ DFDL schemas that combine other DFDL schemas using managed 
dependencies (e.g., maven/sbt) on the Jar files of the dependency DFDL schemas.
+- Managed dependencies can be used to obtain specific versions of DFDL schemas 
for applications in the same way that applications obtain and depend upon Java 
libraries.
+- Digital signatures (signed jars) can enhance security by providing trust in 
the creator of the packaged DFDL schema jar.
+- Standard sbt tools facilitate all of this.
+
+
+=== Introduction to DFDL Schema Packaging
+
+DFDL schemas can be large collections of files.
+There are DFDL schemas with over 100 files spread over numerous directories.
+
+The organization of the files into these directory structures is not arbitrary.
+It can be needed to avoid file name clashes and serves the same role as the 
Java package-name directory structure does for Java programs.
+The directory hierarchy defines a Java package-like namespace structure for 
DFDL schemas.

Review Comment:
   Well, this is tricky. 
   
   The reason to flatten the hierarchy was primarily for training - people 
working from command-line tools were struggling with the file tree depth for 
simple examples. Too much typing of ' cd src/test/org/foo/bar/baz' to get to 
source then the opposed path to get back to src/main/... etc. Really that's why 
we did the flattening. 
   
   For larger schemas I think the hierarchy is still needed. But... 
   
   Daffodil has it's classpath-oriented resolver so that schemas can reference 
includes/imports from places found via classpath search.
   
   But if a DFDL schema is used as an XSD, to validate XML separately from the 
parse, and this is done by some other XML tool, then that tool may need to use 
a similar classpath-oriented resolver to find the various pieces of the schema, 
based on the schemaLocation attributes in the DFDL schema import/include 
statements. That is if they want to directly pull the schema files from jars.
   
   For Java-based XML tools we could perhaps specify they use exactly the 
Daffodil resolver. 
   
   (There's actually some centralized Apache resolver project (part of XML 
commons?) intended to address exactly this weak spot in the w3c XML specs about 
how schema locations work. Possibly what we're doing in the Daffodil resolver 
with classpath search should become a part of that effort, or the relationship 
of the two things should be explored anyway.)
   
   But for other technology bases than Java, well XML Catalogs are also a 
possibility, but we don't really test that, and there's even a JIRA ticket 
suggesting we deprecate catalog support entirely. 
   
   Hence, we need the "un-jar" technique to also work: "un-zip all the jars on 
top of the same directory tree", knowing that the directory structure insures 
no name collisions. I believe if the XML processing always resolves 
schemaLocations relative to the root of that tree, then all schema files would 
be resolvable. This is what we hope to be the lowest-common-denominator for 
schemaLocation resolvers. 
   
   I think we need to document this un-jar technique.
   
   But as you know it gets worse because in some applications, people have to 
break up the set of files further. E.g., in cybersecurity people want each data 
flow to have exactly and only the DFDL schema files it needs as part of that 
flow's configurations, and commonly a particular flow allows only a subset of 
all the things in the format. 
   
   For now I believe this reorganization of a schema also has to be done by 
hand. But we need to document how, and then if automated tools that chase the 
include/imports can be created so as to help automate this, that would be an 
improvement. 
   
   I don't like the notion of the DFDL schema having to cater to the needs of 
the application too much. It makes reuse of the schema harder for other 
applications such as data integration. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to