scholarsmate commented on code in PR #876: URL: https://github.com/apache/daffodil/pull/876#discussion_r1023105429
########## DEVELOP.md: ########## @@ -0,0 +1,485 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +# Daffodil Developer Guide + +This guide is for a software engineer who wants to develop Daffodil +code. This guide will help you become familiar with DFDL schemas, +DFDL processors, and Daffodil development including its code style, +code workflow, development environment, directory organization, +documentation, and tests. + +## DFDL + +The [Data Format Description +Language](https://en.wikipedia.org/wiki/Data_Format_Description_Language) +(DFDL) is a language used to describe almost all data formats both +logically and physically. DFDL is not a data format or a procedural +language; rather, it is a data modeling language based on a subset of +XML Schema annotated with DFDL properties describing the +representation and layout of each element of the schema inside a +native text or binary data format. A DFDL schema allows data to be +converted between its native data format (physical representation, +also called a text or binary file) and a DFDL information set (logical +representation, also called an infoset) such as an XML document, EXI +document, JSON document, SAX callbacks, or several document object +model APIs in memory (JDOM, Scala Node, W3C DOM). When you have a +DFDL schema for a native data format, you can pick whichever infoset +type is easiest for you or your application to use and tell a DFDL +processor to read ("parse") a text or binary file from its native data +format to that infoset type. You or your application can do whatever +you need to do with the infoset and then you can use the same DFDL +schema and DFDL processor to write ("unparse") the infoset back to its +native text or binary file format again, completing a round trip from +native data to infoset to native data again. + +Using DFDL avoids inventing a completely new data modeling language, +avoids writing any parsing and serialization code (with all the bugs +that normally arise from implementing such code procedurally), and +makes it much easier to convert any native data format to an infoset, +operate on the infoset, and convert an infoset back to its native data +format again. + +To learn more about DFDL, you can watch two short +[videos](https://community.ibm.com/community/user/integration/viewdocument/get-started-with-the-data-format-de) +put together by Steve Hanson, co-chair of the DFDL working group, read +a +[slideshow](https://www.slideshare.net/mbeckerle/dfdl-and-apache-daffodil-overview-from-owl-cyber-defense) +written by Mike Beckerle, co-chair of the DFDL working group, or go +through some [tutorials](http://www.xfront.com/DFDL/) written by Roger +Costello, chair of the DFTVL working group (Data Format Transformation +and Validation Language, a future not-yet-defined language to specify +policies for cross-domain system devices). + +## Apache Daffodil, IBM DFDL, and ESA DFDL4S + +The standards organization in which DFDL started, the Open Grid Forum, +required 2 implementations in order to move forward with the +standardization process. This means that there are two leading DFDL +processors, a commercial implementation called IBM DFDL bundled into +IBM's [Integration +Bus](https://www.ibm.com/docs/en/integration-bus/10.0?topic=model-data-format-description-language-dfdl) +products and an open source implementation called [Apache +Daffodil](https://daffodil.apache.org/) hosted by the Apache Software +Foundation. The European Space Agency also has created a proprietary +implementation called [ESA +DFDL4S](https://eop-cfi.esa.int/index.php/applications/dfdl4s), which +can be used only with their satellite communication formats and is +provided only in the form of binary libraries, not source code. + +Among these three DFDL processors, Apache Daffodil is considered the +most modern and thorough implementation of the [Data Format +Description Language v1.0 +Specification](https://daffodil.apache.org/docs/dfdl/). Even so, +Apache Daffodil lists some [unsupported +features](https://daffodil.apache.org/unsupported/) of the DFDL +specification. IBM DFDL lists some more [unsupported +features](https://www.ibm.com/docs/en/integration-bus/10.0?topic=dfdl-unsupported-features) +and also lists some [implementation-specific +limits](https://www.ibm.com/docs/en/integration-bus/10.0?topic=dfdl-implementation-specific-limits). +These limitations will not prevent you from writing DFDL schemas for +almost all data formats, but they will reveal which parts of the DFDL +specification are rarely used. + +## Daffodil Development + +The Apache Software Foundation hosts the Apache Daffodil project on +the following ASF infrastructure: + +- Daffodil's [issue + tracker](https://issues.apache.org/jira/projects/DAFFODIL/) is + hosted on JIRA +- Daffodil's + [users](https://lists.apache.org/[email protected]), + [dev](https://lists.apache.org/[email protected]), + and + [commits](https://lists.apache.org/[email protected]) + mailing lists are hosted on Apache Pony Mail +- Daffodil's [source code](https://github.com/apache/daffodil) is + hosted on GitHub +- Daffodil's + [wiki](https://cwiki.apache.org/confluence/display/DAFFODIL/) is + hosted on Confluence +- Daffodil's [website](https://daffodil.apache.org/) is hosted on + Apache with static content generated by Jekyll using jekyll-asciidoc + and asciidoctor-diagram plugins + +A good Daffodil developer has a GitHub account and two copies of +Daffodil's source code repository, a reading copy directly cloned from +the Apache repository and a working copy cloned from the developer's +own fork of the Apache repository. A good Daffodil developer also +gets an account on the Apache Software Foundation's JIRA and +Confluence servers and subscribes to all three Daffodil mailing lists. + +You can clone your reading copy directly from the Apache repository, +but you should never make any changes in it. The only commands you +should ever run in your reading copy are `git pull`, `git log +ORIG_HEAD..HEAD`, and `git diff ORIG_HEAD..HEAD` in order to see what +changes have been made by other developers since your last pull from +the Apache repository. Your reading copy should remain an exact copy +of the Apache source code repository at all times which you can use +for reading or occasionally running `diff` commands between your +reading copy and working copy. To be safe, rename your reading copy +from `daffodil` to `daffodil-asf` and never edit files or run `sbt` in +it so there will be nothing to push to the Apache repository even if +you accidentally run `git push` from the wrong source tree. + +You can set up your working copy by forking the Apache repository to +your own GitHub account and then cloning your working copy from your +fork. Cloning your working copy from your fork instead of the Apache +repository ensures that any changes you commit in your working copy +will be pushed safely to only your own fork, not the Apache +repository. + +### Code Style + +Daffodil mandates standard Scala formatting and its code generally +keeps close to that formatting consistently. No one has run an +automatic Scala formatter on the codebase yet (there has not been much +need since the code is already formatted pretty well) but +[DAFFODIL-2133](https://issues.apache.org/jira/browse/DAFFODIL-2133) Review Comment: In the ticket cited, once this PR is merged, we should add a reminder to update this guide once ticket has been resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
