Hi Mike, 
Let me answer this as best I can.  Firstly, just to be clear on this point, the 
phase 1 implementation isn’t the desired state.  It’s not really all that 
workable, but it gets what you’ve already done merged.   Since, as you 
mentioned, DFDL needs multiple files, what if you were to put these files in 
the classpath in a folder?  IE:

Classpath/schema1/
Classpath/schema2/ 

For tests, I’d imagine all you have to do is copy the valid files into the 
test/resources/ folder then run your queries.   In real life situations a user 
would have to copy all the files into the classpath of all drill nodes.  This 
will be dealt with in phase 2.  In phase 2, the user will simply have to copy 
the files into a staging directory and Drill will handle copying them to all 
nodes.  (I think)

Best,
— C


> On Oct 3, 2024, at 10:15, Mike Beckerle <mbecke...@apache.org> wrote:
> 
> I agree we can do the phase1 merge. It should not break anything.
> 
> Phase 2 ... Paul suggested "just throw everything into
> $DRILL_CONFIG_DIR", plugin jars, schema jars, everything, as
> apparently that gets automatically copied everywhere and put on the
> class path.
> 
> I left off right at that point for lack of knowledge.
> 
> How would a test work that way? I.e, a maven test under
> src/test/java... how is it going to arrange for DRILL_CONFIG_DIR to be
> defined, and put things into that directory before drill executes (and
> reads the env for DRILL_CONFIG_DIR's value). I normally think of
> env-vars as frozen at the time the JVM starts, so tests can't change
> them unless they are forking a process, and in a complex system like
> drill I have no idea the implications of this.
> 
> The only logic change needed I think is to deal with "there is exactly
> 1 file to parse and query", vs. "there are numerous files to parse and
> query"  These files could, I suppose, be distributed somehow, but they
> also could just be a bunch of files. My guess is drill already has all
> of this, and we just have to reuse the pattern from some other
> extension.
> 
> 
> On Wed, Oct 2, 2024 at 9:17 AM Charles Givre <cgi...@apache.org> wrote:
>> 
>> Hi Mike,
>> I hope all is well.  I need to apologize as I grossly overestimated my 
>> available free time to assist with the DFDL / Drill integration.  I had a 
>> thought which I wanted to propose.
>> 
>> My thinking is that we should complete the integration in two phases:
>> 
>> Phase 1:
>> For phase 1, I propose that we merge the work that you’ve already done.  
>> We’d have to make sure that the DFDL files are accessible from the class 
>> path.  This isn’t really a great solution, but it is just to get the pieces 
>> in so we can work on phase 2.  I don’t like seeing good work languishing in 
>> the PR queue and getting stale.  To complete phase 1, all we’d really have 
>> to do is get the unit tests working.
>> 
>> Phase 2:
>> The remaining issue revolves around making the DFDL files accessible to 
>> Drill and also so that a user can easily add or remove files.  For this we 
>> have a solution: DRILL-4726[1] which provides dynamic UDF support.  
>> Basically what I’m proposing is that we duplicate the components of this PR 
>> for Drill.  The end result would be that a user could copy the UDF files to 
>> a staging directory.  Then the user would run a command like:
>> 
>> CREATE DAFFODIL SCHEMA xxxx USING JAR yyyyy
>> 
>> When the user does that, the file would be propagated to all the Drill 
>> nodes.  Implementing this feature would really involve a lot of duplicating 
>> with slight mods from that pull request.  What do you think?
>> Best,
>> — C
>> 
>> 
>> 
>> [1]: https://github.com/apache/drill/pull/574
>> 
>> 
>> 

Reply via email to