Understood. Thanks for the interest!

- Steve

On 08/02/2017 02:57 PM, Dave Fisher wrote:
> Hi Steve,
> 
> It was not so much the lack of committers as it was the current diversity. 
> That is not a blocker for entry to Incubation.
> 
> I am willing to be one of the Mentors. Once there are at least two more we 
> can push forward.
> 
> Regards,
> Dave
> 
>> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <stephen.d.lawre...@gmail.com> 
>> wrote:
>>
>> Discussions have died down, and I think the consensus from the responses
>> is that the issues are 1) the lack of committers and 2) the lack of a
>> champion and mentors. We hope to address #1 and grow the community as
>> part of incubation. Is anyone interested in being a champion or mentor
>> and help us with #2?
>>
>> Thanks,
>> - Steve
>>
>> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
>>> This sounds like a very interesting project.
>>>
>>> I don’t have the time to mentor at the moment but I will keep a close eye 
>>> on it.
>>>
>>> Cheers,
>>> Chris Mattmann
>>>
>>>
>>>
>>>
>>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <mche...@illinois.edu> 
>>> wrote:
>>>
>>>    Hi Dave,
>>>
>>>    The developers that were at NCSA have moved on to other organizations.  
>>> While we still leverage Daffodil and are very much interested in seeing it 
>>> move forward, development is currently done by the Tresys team.  Agreed on 
>>> the synergy with Tika.
>>>
>>>    Kenton McHenry, Ph.D.
>>>    Principal Research Scientist, Adjunct Assistant Professor of Computer 
>>> Science
>>>    Deputy Director of the Scientific Software & Applications Division
>>>    National Center for Supercomputing Applications, University of Illinois 
>>> at Urbana-Champaign
>>>
>>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher 
>>> <dave2w...@comcast.net<mailto:dave2w...@comcast.net>> wrote:
>>>
>>>    Hi Kenton,
>>>
>>>    Is there any reason that you and others from the NCSA are not Initial 
>>> Committers? That would make this proposal stronger.
>>>
>>>    Regarding Apache Tika - it relies on other projects including Apache POI 
>>> and Apache PDFBox. They are pragmatic about what is used. If Daffodil works 
>>> to expand then I think that there would be good synergy between the 
>>> projects. I know as a POI PMC member that the POI community has 
>>> significantly benefited from the Tika community some of whom are from Mitre.
>>>
>>>    To date Tika has not emphasized structured data, although they do 
>>> extract content from Excel and OpenOffice.
>>>
>>>    I am intrigued.
>>>
>>>    Regards,
>>>    Dave
>>>
>>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron 
>>> <mche...@illinois.edu<mailto:mche...@illinois.edu>> wrote:
>>>
>>>    Yes, DFDL and its open source implementation Daffodil are more about 
>>> file formats and getting access to the entirety of a file's contents in a 
>>> consistent way through machine readable specifications.  The work has 
>>> implications in the area of digital preservation allowing one to preserve 
>>> these machine readable specifications rather than all the tools needed to 
>>> open/save a file in order to work with it.  Imagine someone developing 
>>> graphics software to work with 3D models and not having to worry about the 
>>> hundreds of formats out there for 3D meshes (whether there are tools for 
>>> opening the files and whether they can get access to those tools, whether 
>>> the spec is available and worrying about how complex that spec is to 
>>> implement, etc.), and simply building their code around the contents (e.g. 
>>> vertices, faces, etc.).  One could come up with similar scenarios for other 
>>> data types (documents, images, videos, audio, depth data, numeric data).  
>>> Ideally tools built supporting DFDL, could someday, support any format for 
>>> that type without the developer having to worry about the details of how 
>>> that data is represented within a file.
>>>
>>>    Kenton McHenry, Ph.D.
>>>    Principal Research Scientist, Adjunct Assistant Professor of Computer 
>>> Science
>>>    Deputy Director of the Scientific Software & Applications Division
>>>    National Center for Supercomputing Applications, University of Illinois 
>>> at Urbana-Champaign
>>>
>>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence 
>>> <stephen.d.lawre...@gmail.com<mailto:stephen.d.lawre...@gmail.com><mailto:stephen.d.lawre...@gmail.com>>
>>>  wrote:
>>>
>>>    I'll preface this saying that I don't have a ton of experience with
>>>    Apache Tika. But based on my understanding, Tika and Daffodil do have
>>>    somewhat similar goals, but reach them in different ways. For example,
>>>    Tika requires that one writes /code/ to perform data extraction, usually
>>>    relying on existing Java libraries to extract the desired metadata. The
>>>    downside to this is that code can be buggy, and libraries might not even
>>>    exist for formats of interest (especially common with legacy and
>>>    military data).
>>>
>>>    Daffodil, on the other hand, does not require one to write any code.
>>>    Instead, one writes a DFDL Schema (similar to XML Schema, with DFDL
>>>    annotations) that fully describes the data, which Daffodil then uses to
>>>    convert the data to XML/JSON for extraction. So adding support for a new
>>>    format means writing a new schema rather than new code. And less code
>>>    generally means less bugs. Also, for secure systems that require
>>>    certification, generally speaking, it is easier to certify a schema as
>>>    compared to code.
>>>
>>>    We certainly don't believe that Daffodil could replace Tika, but it does
>>>    have the potential to add new functionality to Tika for formats that do
>>>    not have existing libraries. One of our goals is to look into
>>>    integrating Daffodil support into tools like Tika. We'd love to hear
>>>    from Tika devs if this is something they'd be interested in.
>>>
>>>    I'll also add that whereas Tika tends to focus primarily on metadata,
>>>    DFDL schemas usually describe an entire file format down to the byte, so
>>>    one can extract more than just meta data, including text and binary
>>>    data. Further differentiating, Daffodil has support for serializing data
>>>    (called unparse) from the XML/JSON representation, allowing one to
>>>    transform or filter data as well. We don't believe this feature is all
>>>    that applicable to Tika, but may be useful to other technologies such as
>>>    filtering or data fuzzing technologies.
>>>
>>>    - Steve
>>>
>>>
>>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
>>>    What is the relationship between Daffodil and something like Apache 
>>> Tika's
>>>    extraction engine?
>>>
>>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
>>>    
>>> stephen.d.lawre...@gmail.com<mailto:stephen.d.lawre...@gmail.com><mailto:stephen.d.lawre...@gmail.com>>
>>>  wrote:
>>>
>>>    Dear Apache Incubator Community,
>>>
>>>    We would like to start a discussion around a proposal to bring Daffodil
>>>    into the Apache Incubator. Daffodil is a implementation of the DFDL
>>>    specification used to convert between fixed format data and XML/JSON.
>>>
>>>    The draft proposal can be found in the wiki at the following URL:
>>>
>>>    https://wiki.apache.org/incubator/DaffodilProposal
>>>
>>>    We do not yet have a champion or mentors, but it was recommended that we
>>>    create a proposal and send it to this list to potentially find those
>>>    that might be interested. The text for the draft proposal is found
>>>    below. We look forward to your input.
>>>
>>>    Thanks,
>>>    -Steve
>>>
>>>
>>>    = Daffodil Proposal =
>>>
>>>    == Abstract ==
>>>
>>>    Daffodil is an implementation of the Data Format Description Language
>>>    (DFDL) used to convert between fixed format data and XML/JSON.
>>>
>>>    == Proposal ==
>>>
>>>    The Data Format Description Language (DFDL) is a specification,
>>>    developed by the Open Grid Forum, capable of describing many data
>>>    formats, including both textual and binary, scientific and numeric,
>>>    legacy and modern, commercial record-oriented, and many industry and
>>>    military standards. It defines a language that is a subset of W3C XML
>>>    schema to describe the logical format of the data, and annotations
>>>    within the schema to describe the physical representation.
>>>
>>>    Daffodil is an open source implementation of the DFDL specification that
>>>    uses these DFDL schemas to parse fixed format data into an infoset,
>>>    which is most commonly represented as either XML or JSON. This allows
>>>    the use of well-established XML or JSON technologies and libraries to
>>>    consume, inspect, and manipulate fixed format data in existing
>>>    solutions. Daffodil is also capable of the reverse by serializing or
>>>    "unparsing" an XML or JSON infoset back to the original data format.
>>>
>>>    == Background ==
>>>
>>>    Many different software solutions need to consume and manage data,
>>>    including data directed routing, databases, data analysis, data
>>>    cleansing, data visualizing, and more. A key aspect of such solutions is
>>>    the need to transform the data into an easily consumable format.
>>>    Usually, this means that for each unique data format, one develops a
>>>    tool that can read and extract the necessary information, often leading
>>>    to ad-hoc and data-format-specific description systems. Such systems are
>>>    often proprietary, not well tested, and incompatible, leading to vendor
>>>    lock-in, flawed software, and increased training costs. DFDL is a new
>>>    standard, with version 1.0 completed in October of 2016, that solves
>>>    these problems by defining an open standard to describe many different
>>>    data formats and how to parse and unparse between the data and XML/JSON.
>>>
>>>    Two closed source implementations of DFDL currently exist. The first was
>>>    created by IBM and is now part of their IBM® Integration Bus product.
>>>    The second was created by the European Space Agency, called DFDL4S or
>>>    "DFDL for Space" targeted at the challenges of their satellite data
>>>    processing.
>>>
>>>    Around 2005, Pacific Northwest National Lab created Defuddle, built as
>>>    an open source implementation and proof of concept of the draft DFDL
>>>    specification and a test bed to feed new concepts into specification
>>>    development. Primary development of Defuddle was eventually taken over
>>>    by the National Center for Supercomputing Applications (NCSA). However,
>>>    due to evolution of the DFDL specification and architectural and
>>>    performance issues with Defuddle, around 2009, NCSA restarted the
>>>    project with the new name of Daffodil, with a goal of implementing the
>>>    complete DFDL specification. Daffodil development continued at NCSA
>>>    until around 2012, at which point development slowed due to budget
>>>    limitations. Shortly thereafter, primary development was picked up by
>>>    Tresys Technology where it continues today, with contributions from
>>>    other entities such as the Navy Research Lab, the Air Force Research
>>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil
>>>    version 1.0.0 was released, including support for the DFDL features
>>>    needed to parse many common file formats. Daffodil version 2.0.0 is
>>>    expected to be released in August of 2017, which will include unparse
>>>    support with one-to-one parsing feature parity.
>>>
>>>    Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman, Quark
>>>    Security, Raytheon, and Tresys Technology have developed DFDL schemas
>>>    for many data formats from varying technology domains, including PNG,
>>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and MIL-STD-2045,
>>>    many of which are publicly available on the DFDL Schemas github. There
>>>    are also a number of military-application data formats, the
>>>    specifications of which are not public, which have historically been
>>>    very difficult and expensive to process, and for which DFDL schemas have
>>>    been created or are actively in development; these include
>>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG 5516
>>>    (aka "Link16").
>>>
>>>    == Rationale ==
>>>
>>>    Numerous software solutions exist that consume, inspect, analyze, and
>>>    transform data, many of which can be found in the Apache Software
>>>    Foundation (ASF). In order for tools like these to consume new types of
>>>    data, custom extensions are usually required, often with high
>>>    development and testing costs. Daffodil fills a clear gap in many of
>>>    these solutions, providing a simple and low cost way to transform data
>>>    to XML or JSON, which many of these tools natively support already. With
>>>    the upcoming 2.0.0 release, the Daffodil project will have achieved a
>>>    level of functionality in both parse and unparse that, when integrated
>>>    into existing solutions, could provide for a new method to quickly
>>>    enable support for new data formats.
>>>
>>>    == Initial Goals ==
>>>
>>>    * Relicense the existing code from the University of Illinois/NCSA Open
>>>    Source License to the Apache License version 2.0, working with Apache
>>>    Legal to ensure correctness, and with Daffodil contributors to get
>>>    their permission.
>>>    * Move the existing codebase, documentation, bugs, and mailing lists to
>>>    the Apache hosted infrastructure
>>>    * Establish a formal release process and schedule, allowing for
>>>    dependable release cycles in a manner consistent with the Apache
>>>    development process.
>>>    * Build relationships with ASF projects to add Daffodil support where
>>>    appropriate
>>>    * Grow the community to establish a diversity of background and 
>>> expertise.
>>>
>>>    == Current Status ==
>>>
>>>    === Meritocracy ===
>>>
>>>    All initial committers are familiar with the principles of meritocracy.
>>>    The Daffodil project has followed the model of meritocracy in the past,
>>>    providing multiple outside entities commit access based on the quality
>>>    of their contributions. In order to grow the Daffodil user base and
>>>    development community, we are dedicated to continuing to operate
>>>    Daffodil as a meritocracy.
>>>
>>>    A key ingredient in a meritocracy of developers is open group code
>>>    review. The Daffodil project has operated in this mode throughout its
>>>    existence and this provides a forum to improve the code, verify code
>>>    quality, and educate new developers on the code base.
>>>
>>>    === Community ===
>>>
>>>    Daffodil has a small community of users and developers. Although primary
>>>    Daffodil development is done by Tresys Technology, a handful of other
>>>    contributions have come from other entities including the Navy Research
>>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton. In
>>>    addition to developers, multiple users of Daffodil have created DFDL
>>>    schemas, including entities such as MITRE, IBM, Raytheon, Quark
>>>    Security, and Tresys Technology. The DFDL Schemas github community has
>>>    been created as a place for DFDL schemas to be published. The Daffodil
>>>    project also makes use of mailing lists, !HipChat, and Confluence
>>>    Questions to build a community of users and system for support.
>>>
>>>    === Core Developers ===
>>>
>>>    The core developers of Daffodil are employed by Tresys Technology. We
>>>    will work to grow the community among a more diverse set of developers
>>>    and industries.
>>>
>>>    === Alignment ===
>>>
>>>    Daffodil was created as an open source project with a philosophy
>>>    consistent with The Apache Way. A strong belief in meritocracy,
>>>    community involvement in decisions, openness, and ensuring a high level
>>>    of quality in code, documentation, and testing are some of our shared
>>>    core beliefs.
>>>
>>>    Further, as mentioned in the Rationale section, Daffodil fills a gap
>>>    that exists in many ASF projects, including !NiFi, Spark, Storm, Hadoop,
>>>    Tika, and others. In order for tools like these to consume new types of
>>>    data, custom extensions are usually required. Rather than create such
>>>    extensions, Daffodil provides an easy and standards-compliant way to
>>>    transform data to XML or JSON, which many of these tools already
>>>    natively support.
>>>
>>>    == Known Risks ==
>>>
>>>    === Orphaned Products ===
>>>
>>>    The current core developers are the leading contributors in the space of
>>>    DFDL and wish to see it flourish. Though there is some risk that the
>>>    initial committers all come from the same company, a goal of entering
>>>    into incubation is to grow the development community to minimize the
>>>    risk of reliance on a single company.
>>>
>>>    === Inexperience with Open Source ===
>>>
>>>    The Daffodil project began as an open source project and has continued
>>>    that model throughout development. This includes public bug tracking,
>>>    git revision control, automated builds and tests, and a public wiki for
>>>    documentation.
>>>
>>>    Additionally, the current core developers and initial committers all
>>>    work for a company that relies on, believes in, promotes, and has led or
>>>    contributed to many open source software projects, including SELinux
>>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As such,
>>>    there is low risk related to inexperience with open source software and
>>>    processes.
>>>
>>>    === Homogeneous Developers ===
>>>
>>>    The proposed initial committers come from a single entity, though we are
>>>    committed to growing the Daffodil development community to include a
>>>    broad group of additional committers from a wide array of industries.
>>>
>>>    === Reliance on Salaried Developers ===
>>>
>>>    The proposed initial committers are paid by their employer to contribute
>>>    to the Daffodil project. We expect that Daffodil development will
>>>    continue with salaried developers, and are committed to growing the
>>>    community to include non-salaried developers as well.
>>>
>>>    === Relationship with other Apache Projects ===
>>>
>>>    As mentioned in the Alignment section, Daffodil fills a clear gap in
>>>    numerous other ASF projects that consume and manage large amounts of 
>>> data.
>>>
>>>    As a specific example, Daffodil developers have created a Daffodil
>>>    Apache !NiFi Processor, currently in use in data transfer solutions,
>>>    which allows one to ingest non-native data into an Apache !NiFi pipeline
>>>    as XML or JSON. This processor was well received by the Apache !NiFi
>>>    developers, with positive comments about the concise API and how it
>>>    could handle non-native data. Daffodil developers have also successfully
>>>    prototyped integration with Apache Spark. We believe Daffodil could
>>>    provide a strong benefit to many other ASF projects that handle fixed
>>>    format data. We anticipate working closely with such ASF projects to
>>>    include Daffodil where applicable to increase their ability to support
>>>    new data formats with minimal effort.
>>>
>>>    Daffodil also depends on existing ASF projects, including Apache Commons
>>>    and Apache Xerces.
>>>
>>>    === An Excessive Fascination with the Apache Brand ===
>>>
>>>    Although the Apache brand may certainly help to attract more
>>>    contributors, publicity is not the reason for this proposal. We believe
>>>    Daffodil could provide a great benefit to the ASF and the numerous data
>>>    focused projects that comprise it, as described in the Rationale and
>>>    Alignment sections. We hope to build a strong and vibrant community
>>>    built around The Apache Way, and not dependent on a single company.
>>>
>>>    === Documentation ===
>>>
>>>    Daffodil documentation can be found at:
>>>
>>>    *
>>>    https://opensource.ncsa.illinois.edu/confluence/
>>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
>>>
>>>    Information about DFDL can be found at:
>>>
>>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>>>    *
>>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
>>>    0/com.ibm.etools.mft.doc/df20060_.htm
>>>
>>>    Public examples of DFDL Schemas can be found at:
>>>
>>>    * https://github.com/DFDLSchemas
>>>
>>>    == Initial Source ==
>>>
>>>    The Daffodil git repo goes back to mid-2011 with approximately 20
>>>    different contributors and feedback from many users and developers. The
>>>    core codebase is written in Scala and includes both a Scala and Java
>>>    API, along with Javadocs and Scaladocs for API usage. The initial code
>>>    will come from the git repository currently hosted by NCSA at the
>>>    University of Illinois :
>>>
>>>    https://opensource.ncsa.illinois.edu/bitbucket/
>>>    projects/DFDL/repos/daffodil/
>>>
>>>    == Source and Intellectual Property Submission ==
>>>
>>>    The complete Daffodil code is licensed under the University of
>>>    Illinois/NCSA Open Source License. Much of the current codebase has been
>>>    developed by Tresys Technology, who is open to relicensing the code to
>>>    the Apache License version 2.0 and donate the source to the ASF.
>>>    Contacts at NCSA are also open to relicensing their contributions to
>>>    Apache v2. We plan to contact the other contributors and ask for
>>>    permission to relicense and donate their contributed code. For those
>>>    that decline or we cannot contact, their code will be removed or
>>>    replaced. We will work closely with Apache Legal to ensure all issues
>>>    related to relicensing are acceptable.
>>>
>>>    == External Dependencies ==
>>>
>>>    We believe all current dependencies are compatible with the ASF
>>>    guidelines. Our dependency licenses come from the following license
>>>    styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil
>>>    dependencies and their licenses are documented here:
>>>
>>>    https://opensource.ncsa.illinois.edu/confluence/
>>>    display/DFDL/Dependencies+and+Licenses
>>>
>>>    == Cryptography ==
>>>
>>>    None
>>>
>>>    == Required Resources ==
>>>
>>>    === Mailing Lists ===
>>>
>>>    * comm...@daffodil.incubator.apache.org
>>>    * d...@daffodil.incubator.apache.org
>>>    * priv...@daffodil.incubator.apache.org
>>>    * u...@daffodil.incubator.apache.org
>>>
>>>    === Source Control ===
>>>
>>>    git://git.apache.org/incubator-daffodil.git
>>>
>>>    === Issue Tracking ===
>>>
>>>    JIRA Daffodil (DFDL)
>>>
>>>    === Initial Committers ===
>>>
>>>    * Beth Finnegan <efinnegan at tresys dot com>
>>>    * Dave Thompson <dthompson at tresys dot com>
>>>    * Josh Adams <jadams at tresys dot com>
>>>    * Mike Beckerle <mbeckerle at tresys dot com>
>>>    * Steve Lawrence <slawrence at tresys dot com>
>>>    * Taylor Wise <twise at tresys dot com>
>>>
>>>    === Affiliations ===
>>>
>>>    * Beth Finnegan (Tresys Technology)
>>>    * Dave Thompson (Tresys Technology)
>>>    * Josh Adams (Tresys Technology)
>>>    * Mike Beckerle (Tresys Technology)
>>>    * Steve Lawrence (Tresys Technology)
>>>    * Taylor Wise (Tresys Technology)
>>>
>>>    == Sponsors ==
>>>
>>>    === Champion ===
>>>
>>>    * TBD
>>>
>>>    === Nominated Mentors ===
>>>
>>>    * TBD
>>>
>>>    === Sponsoring Entity ===
>>>
>>>    We request the Apache Incubator to sponsor this project.
>>>
>>>    ---------------------------------------------------------------------
>>>    To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>>>    For additional commands, e-mail: general-h...@incubator.apache.org
>>>
>>>
>>>
>>>
>>>
>>>    ---------------------------------------------------------------------
>>>    To unsubscribe, e-mail: 
>>> general-unsubscr...@incubator.apache.org<mailto:general-unsubscr...@incubator.apache.org>
>>>    For additional commands, e-mail: 
>>> general-h...@incubator.apache.org<mailto:general-h...@incubator.apache.org>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>>> For additional commands, e-mail: general-h...@incubator.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to