I decided the only way to force getting this Drill + Daffodil integration done, or at least started, is to have a deadline.
So I submitted this abstract below for the upcoming "Community over Code" (formerly known as ApacheCon) conference this fall (Oct 7-10) I'm hoping this forces some of the refactoring that is gating other efforts and fixes in Daffodil at the same time. *Direct Query of Arbitrary Data Formats using Apache Drill and Apache Daffodil* *Suppose you have data in an ad-hoc data format like **EDIFACT, ISO8583, Asterix, some COBOL FD, or any other kind of data. **You can now describe it with a Data Format Description Language (DFDL) schema, then u**sing Apache Drill, you can directly query that data and those queries can also incorporate data from any of Apache Drill's other array of data sources. **This talk will describe the integration of Apache Drill with Apache Daffodil's DFDL implementation. **This deep integration implements Drill's metadata model in terms of the Daffodil DFDL metadata model, and implements Drill's data model in terms of the Daffodil DFDL Infoset API. This enables Drill queries to operate intelligently on DFDL-described data without the cost of data conversion into an expensive intermediate form like JSON or XML. **The talk will highlight the specific challenges in this integration and the lessons learned that are applicable to integration of other Apache projects having their own metadata and data models. *