Re: [PR] DRILL-8474: Add Daffodil Format Plugin to Drill (drill)

via GitHub Fri, 07 Nov 2025 07:48:53 -0800


cgivre commented on PR #2989:
URL: https://github.com/apache/drill/pull/2989#issuecomment-3503304818


   @mbeckerle 
   See inline.
   
   > The issue I'm seeing is that schemas are normally pre-compiled into a 
".bin" file which is fast to load, but in addition to this file, the schema may 
have a dependency on certain Daffodil plug in code, which is compiled java in 
jar files. This dependency can be on multiple different jar files. All these 
dependency jar files need to be on the classpath.
   > 
   > The daffodil plugins are of 3 kinds. UDFs, "layers" (which compute 
checksums or decompress zip files, etc. ), and charset definitions. All are 
dynamically loaded into the JVM when the DFDL schema requests them. They are 
found using the
   > 
   > All these different jar files need to be on the Java classpath so that 
their metadata allows dynamic loading.
   
   In the current implementation, any file that the user registers will be 
copied into the Daffodil Schema directory.  Would it be sufficient if the user 
added that directory to the classpath?   I'm not sure if this would be a 
security issue or not. 
   
   > 
   > So while a simple DFDL schema might be contained in one jar file, in 
general there can be a dependency on multiple jar files which must be placed 
onto the Java classpath in a specific order. The schema may be needed in source 
form also for validation of data.
   > 
   > As a case in point, on github there are DFDL schema projects named:
   > 
   > * envelope-payload
   > * tcpMessage
   > * mil-std-2045
   > * PCAP
   > * ethernetIP
   > 
   > These are separate component DFDL schemas that are assembled to form an 
assembly schema by way of schema composition. The only jar file that needs to 
be on the classpath is the one from ethernetIP, since that defines a layer 
algorithm for computing IPv4 checksums.
   > 
   > The DFDL schema that combines all these components can be pre-compiled 
into an envelope-payload.bin file.
   > 
   
   If the all this can be combined into one file, that would be the easiest 
route.  Then a user could simply do a `CREATE DAFFODIL SCHEMA` query and that 
file would be copied to the schema directory where it can be accessed in Drill 
queries. 
   
   > So in this case I need this ".bin" file to be distributed across the 
cluster and loaded by Daffodil in each drill bit, and with the ethernetIP.jar 
file distributed across the drill cluster and the ethernetIP.jar needs to be on 
the classpath of the drill bit java process.
   
   If the classpath solution won't work, what would you suggest?  
Alternatively, we could simply require that the user add the JAR manually to 
the class path of all Drill nodes. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] DRILL-8474: Add Daffodil Format Plugin to Drill (drill)

Reply via email to