I decided the only way to force getting this Drill + Daffodil integration
done, or at least started, is to have a deadline.

So I submitted this abstract below for the upcoming "Community over Code"
(formerly known as ApacheCon) conference this fall (Oct 7-10)

I'm hoping this forces some of the refactoring that is gating other efforts
and fixes in Daffodil at the same time.

*Direct Query of Arbitrary Data Formats using Apache Drill and Apache
Daffodil*


*Suppose you have data in an ad-hoc data format like **EDIFACT, ISO8583,
Asterix, some COBOL FD, or any other kind of data. **You can now describe
it with a Data Format Description Language (DFDL) schema, then u**sing
Apache Drill, you can directly query that data and those queries can also
incorporate data from any of Apache Drill's other array of data sources. **This
talk will describe the integration of Apache Drill with Apache Daffodil's
DFDL implementation. **This deep integration implements Drill's metadata
model in terms of the Daffodil DFDL metadata model, and implements Drill's
data model in terms of the Daffodil DFDL Infoset API. This enables Drill
queries to operate intelligently on DFDL-described data without the cost of
data conversion into an expensive intermediate form like JSON or XML. **The
talk will highlight the specific challenges in this integration and the
lessons learned that are applicable to integration of other Apache projects
having their own metadata and data models. *

Reply via email to