Re: [PR] DRILL-8474: Add Daffodil Format Plugin to Drill (drill)

via GitHub Fri, 13 Feb 2026 06:39:29 -0800


cgivre commented on PR #2989:
URL: https://github.com/apache/drill/pull/2989#issuecomment-3897574658


   Thanks Steve!  
   We’d really welcome feedback once this is merged.  Could I send something to 
the Daffodil list with demo and/or instructions so people could try it out?
   
   Regarding the review, unfortunately,  you’re not a committer to Drill, so we 
need Mike or someone else to approve.  But my goal is to merge this ASAP. 
   Best,
   — C
   
   
   
   
   > On Feb 13, 2026, at 09:34, Steve Lawrence ***@***.***> wrote:
   > 
   > 
   > @stevedlawrence approved this pull request.
   > 
   > Looks great to me! Just a couple tiny comments.
   > 
   > I agree with Mike it would be useful to test some more complex schemas 
that are built with multiple component and plugins jars, like the PCAP one 
mentioned. I think having users modify the classpath would work if Drill 
doesn't have a built-in way to do that. Though maybe the jar registry does 
that? It's not clear to me if that automatically puts things on the classpath 
or just some files available somehow else.
   > 
   > But even without that, it seems like the core functionality is here. And 
although many of the more interesting schemas require multiple jars or plugins, 
there are plenty of schemas that don't, so I think there is still lots of value 
even without it.
   > 
   > I would recommend merging this, and later versions of Drill can work to 
add and or test Daffodil component/plugin support if needed.
   > 
   > In 
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java
 <https://github.com/apache/drill/pull/2989#discussion_r2804134130>:
   > 
   > > +      dmp.setupDP(validationMode, null);
   > +    } else {
   > +      List<Diagnostic> pfDiags;
   > +      try {
   > +        pfDiags = dmp.compileSchema(schemaFileURI, rootName, rootNS);
   > +      } catch (URISyntaxException | IOException e) {
   > +        throw new CompileFailure(e);
   > +      }
   > +      dmp.setupDP(validationMode, pfDiags);
   > +    }
   > +    return dmp.dp;
   > +  }
   > +
   > +  private void loadSchema(URI schemaFileURI) throws IOException, 
InvalidParserException {
   > +    Compiler c = Daffodil.compiler();
   > +    dp = 
c.reload(Channels.newChannel(schemaFileURI.toURL().openStream()));
   > Just to be clear, since I don't think Daffodil documents this very well, 
the way save/reload works in Daffodil is save uses Java serialization to 
serialize a DataProcessor, and reload just deserializes it. In all uses that 
I'm aware of, these .bin files are trusted and so are safe to 
reload/deserialize. But I know there are a number of security concerns with 
deserializing untrusted data. If the Drill security model doesn't necessarily 
trust users with files provided to CREATE DAFFODIL SCHEMA or SELECT * from ... 
type=daffodil, then it might be worth considering if reload is something you 
want to disallow. It's a potential loss since saved parsers are easier to 
distribute and can avoid long schema compilation times, but it does come with 
potential risk.
   > 
   > In 
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java
 <https://github.com/apache/drill/pull/2989#discussion_r2804303273>:
   > 
   > > +   *
   > +   * @return true if there are rows retrieved, false if no rows were 
retrieved, which means no more
   > +   *     will ever be retrieved (end of data).
   > +   * @throws RuntimeException
   > +   *     on parse errors.
   > +   */
   > +  @Override
   > +  public boolean next() {
   > +    // Check assumed invariants
   > +    // We don't know if there is data or not. This could be called on an 
empty data file.
   > +    // We DO know that this won't be called if there is no space in the 
batch for even 1
   > +    // row.
   > +    if (dafParser.isEOF()) {
   > +      return false; // return without even checking for more rows or 
trying to parse.
   > +    }
   > +    while (rowSetLoader.start() && !dafParser.isEOF()) { // we never 
zero-trip this loop.
   > One thing to consider is that it is possible for Daffodil to consume zero 
bits of data and it be considered a success. Schemas that allow that are kindof 
a pathological case and don't really exist in the real world, but might be 
something to consider since it is technically possible.
   > 
   > The way to detect this is to access ParseResult.location().bitPos1b() at 
the end of a parse. If that bit position is the same as before the parse then 
it means the parse successfully consumed zero bits, and it will continue to do 
so in an infinite loop if you keep attempting to parse. For example, this 
usually looks something like this:
   > 
   > int lastBitPosition = 1;
   > while (...)
   >   ParseResult pr = dp.parse();
   >   ... // check isError
   >   int currBitPosition = pr.location().bitPos1b();
   >   if (currBitPosition == lastBitPosition)) {
   >     // consumed no data, exit parsing to avoid infinite loop
   >     break;
   >   }
   >   lastBitPosition = currBitPosition;
   > }
   > Note that even if Daffodil consumes zero bits, it is still possible to 
create infoset events, so you really do need to check the bit position instead 
checking if infoset events were created.
   > 
   > —
   > Reply to this email directly, view it on GitHub 
<https://github.com/apache/drill/pull/2989#pullrequestreview-3797200685>, or 
unsubscribe 
<https://github.com/notifications/unsubscribe-auth/ABKB7PTGDQYOJRHOBQWSSZ34LXOILAVCNFSM6AAAAAB4XWHMGWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTOOJXGIYDANRYGU>.
   > You are receiving this because you were assigned.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] DRILL-8474: Add Daffodil Format Plugin to Drill (drill)

Reply via email to