Is there a chance we can into a webex call at some point so someone can help me out with an initial test run?
On Fri, May 31, 2019 at 19:38 Paul Rogers <[email protected]> wrote: > Hi Nicolas, > > Regarding your point that plugins should be, well, plugins -- independent > of Drill code. Yes, that is true. But, no one has invested the time to make > it so. Doing so would require a clear, stable code API; an easy way to > develop such code without the need for the "build jar, copy to DRILL_HOME, > restart Drill" approach that Charles mentioned. > > There were some recent improvements around the bootstrap file, which is > great. In the mean while, and since the MapR plugin code is already part of > Drill, let's see if we can get the "work within Drill" approach to work for > you. Then, perhaps you can use your experience to suggest changes that > could be made to achieve the "true plugin" goal. All the Drill contributors > who are not part of the core Drill team would likely very much appreciate a > true plugin capability. > > > I use Eclipse, perhaps others who use IntelliJ can comment on the > specifics of that IDE. > > Drill is divided into modules: your code in the contrib module depends on > Drill code in java-exec, vector and so on. When I run tests in java-exec in > Eclipse, Eclipse automatically detects and rebuilds changes in dependent > modules such as common or vector. This establishes that Eclipse, at least, > understands Maven dependencies. > > > I seem to recall that I also got this to work when writing the Drill book > when I created an example plugin in the contrib module. I don't recall > having to change anything to get it to work. Perhaps others who have worked > on other contrib modules can offer their experience. > > > So, one thing to check is if the Maven dependencies are configured > correctly for the MapR plugin. > > One issue which I thought we solved are test-time dependencies. Tim did > some work to ensure that code in src/test is visible to downstream modules. > Which symbols/constructs are causing you problems? Perhaps there is more to > fix? > > For now, perhaps you can target the goal of getting the existing MapR > plugin code to work properly in the IDE. This is supposed to work, so it > might just be a matter of resolving a few specific glitches. > > Has anyone worked on the MapR DB plugin previously and can offer advice? > > Thanks, > - Paul > > > > On Friday, May 31, 2019, 10:10:14 AM PDT, Nicolas A Perez < > [email protected]> wrote: > > One of the issues I have is that I haven’t found a way to debug my tests > from intelliJ. It continues to say that some constructs from other modules > are missing. > > Also, I haven’t found *simple* examples of how to write *simple* tests. > Every time i look at the existing code, the tests are done in a different > way. > > Now, on the other hand, pluggings should be independent from drill core > modules. If you think about, i can easily write a library that can be > injected into Spark without touching Spark code. For instance, the > DataSource API will load the required parts from my code at run time. Drill > does the same, but the problem is the coupling between drill and it’s > extension points. > > On the tests side, you have another problem, you cannot easily tests your > new modules unless they are within drill core code. Maybe it is time to > decoupling the test framework from drill itself, too. > > On Fri, May 31, 2019 at 18:38 Paul Rogers <[email protected]> > wrote: > > > Hi Nicolas, > > > > Charles outlined the choices quite well. > > > > Let's talk about your observation that you find it annoying to deal with > > the full Drill code. There may be some tricks here that can help you. > > > > As you know, I've been revising the text reader and the "EVF" (row set > > framework). Doing so requires a series of pull requests. To move fast, > I've > > found the following workflow to be helpful: > > > > * Use a machine with an SSD. A Mac is ideal. A Linux desktop also works > > (mine uses Linux Mint.) The SSD makes rebuilds very fast. > > > > * Use unit tests for all your testing. For example, I created dozens of > > unit tests for CSV files to exercise the text reader, and many more to > > exercise the EVF. All development and testing consists of adding/changing > > code, adding/changing tests, and stepping through the unit test and > > underlying code to find bugs. > > > > * Use JUnit categories to run selected unit tests as a group. > > > > In most cases, you let your IDE do the build; you don't need Maven nor do > > you need to build jar files. Edit a file, run a unit test from your IDE > and > > step through code. My edit/compile/debug cycle tends to be seconds. > > > > If, however, you find yourself using Maven to build Drill, then are > > running unit tests from Maven, and attaching a debugger, then your > > edit/compile/debug cycle will be 5+ minutes, which is going to be > > irritating. > > > > If you are doing a full build so you can use SqlLine to test, then this > > suggests it is time to write a unit test case for that issue so you can > run > > it from the IDE. Using the RowSet stuff makes such tests easy. See > > TestCsvWithHeaders [1] for some examples. > > > > If you run from the IDE, and find things don't work then perhaps there is > > a config issue. Do we have code that looks for a file in > > $DRILL_HOME/whatever rather than using the class path? Is a required > native > > library not on the LD_LIBRARY_PATH for the IDE? > > > > Most unit tests are designed to be stateless. They read a file stored in > > resources, or they write a test file, read the file, and discard the file > > when done. > > > > You are using MapRDB to insert data, which, of course, is stateful. So, > > perhaps your test can put the DB into a known start state, insert some > > records, read those records, compare them with the expected results, and > > clean up the state so you are ready for the next test run. Your target is > > that edit/compile/debug cycle of a few seconds. > > > > > > Overall, if you can master the art of running Drill, using unit tests, in > > your IDE, you can move forward very quickly. > > > > Use Maven builds, and run tests via Maven, only when getting ready to > > submit a PR. If you change, say, only the contrib module, you only need > > build and test that module. If you also change exec, say, then you can > just > > build those two modules. > > > > To use categories, tag your tests as follows: > > > > @Category(RowSetTests.class) class MyTest ... > > > > (I'll send the Maven command line separately; I'm not on that machine at > > the moment.) > > > > > > Thanks much to the team members who helped make this happen. I've since > > worked on other projects that don't have this power and it is truly a > > grueling experience to wait for long builds and deploys after ever > change. > > > > > > Thanks, > > - Paul > > > > [1] > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/text/compliant/TestCsvWithHeaders.java > > > > > > > > > > > > On Friday, May 31, 2019, 5:17:40 AM PDT, Charles Givre < > > [email protected]> wrote: > > > > Hi Nicolas, > > > > You have two options: > > 1. You can develop format plugins and UDFs in Drill by adding them to > the > > contrib/ folder and then test them with unit tests. Take a look at this > PR > > as an example[1]. If you're intending to submit your work to Drill for > > inclusion, this would be my recommendation as you can write the unit > tests > > as you go, and it doesn't take very long to build and you can debug. > > 2. Alternatively, you can package the code separately as shown here[2]. > > However, this option requires you to build it, then copy the jars over to > > DRILL_HOME/jars/3rd_party along with any dependencies, then run Drill. > I'm > > not sure how you could write unit tests this way. > > > > I hope this helps. > > > > > > [1]: https://github.com/apache/drill/pull/1749 > > [2]: https://github.com/cgivre/drill-excel-plugin > > > > > > > On May 31, 2019, at 8:06 AM, Nicolas A Perez <[email protected]> > > wrote: > > > > > > Paul, > > > > > > Is it possible to develop my plugin outside of the drill code, let's > say > > in > > > my own repository and then package it and add it to the location where > > the > > > plugins live? Does that work, too? I just find annoying to deal with > the > > > full drill code in order to develop a plugin. At the same time, I might > > > want to detach the development of plugins from the drill life cycle > > itself. > > > > > > Please advise. > > > > > > Best Regards, > > > > > > Nicolas A Perez > > > > > > On Thu, May 30, 2019 at 9:58 PM Paul Rogers <[email protected] > > > > > wrote: > > > > > >> Hi Nicolas, > > >> > > >> A quick check of the code suggests that AbstractWriter is a > > >> Json-serialized description of the physical plan. It represents the > > >> information sent from the planner to the execution engine, and is > > >> interpreted by the scan operator. That is, it is the "physical plan." > > >> > > >> The question is, how does the execution engine translate create the > > actual > > >> writer based on the physical plan? The only good example seems to be > for > > >> the FileSystemPlugin. That particular storage plugin is complicated by > > the > > >> additional layer of the format plugins. > > >> > > >> There is a bit of magic here. Briefly, Drill uses a BatchCreator to > > create > > >> your writer. It does so via some Java introspection magic. Drill looks > > for > > >> all subclases of BatchCreator, the uses the type of the second > argument > > to > > >> the getBatch() method to find the correct class. This may mean that > you > > >> need to create one with MapRDBFormatPluginConfig as the type of the > > second > > >> argument. > > >> > > >> The getBatch() method then creates the CloseableRecordBatch > > >> implementation. This is a full Drill operator, meaning it must handle > > the > > >> Volcano iterator protocol. Looks like you can perhaps use > > WriterRecordBatch > > >> as the writer operator itself. (See EasyWriterBatchCreator and follow > > the > > >> code to understand the plumbing.) > > >> > > >> You create a RecordWriter to do the actual work. AFAIK, MapRDB > supports > > >> JSON data model (at least in some form). If this is the version you > are > > >> working on, the fastest development path might just be to copy the > > >> JsonRecordWriter, and replace the writes to JSON with writes to > MapRDB. > > At > > >> least this gives you a place to start looking. > > >> > > >> > > >> A more general solution would be to build the writer using some of the > > >> recent additions to Drill such as the row set mechanisms for reading a > > >> record batch. But, since copying the JSON approach provides a quick & > > dirty > > >> solution, perhaps that is good enough for this particular use case. > > >> > > >> > > >> In our book, we recommend building each step one-by-one and doing a > > quick > > >> test to verify that each step works as you expect. If you create your > > >> BatchCreator, but not the writer, things won't actually work, but you > > can > > >> set a breakpoint in the getBatch() method to verify the Drill did find > > your > > >> class. And so on. > > >> > > >> > > >> Thanks, > > >> - Paul > > >> > > >> > > >> > > >> On Thursday, May 30, 2019, 3:05:39 AM PDT, Nicolas A Perez < > > >> [email protected]> wrote: > > >> > > >> Can anyone give me an overview of how to implement > AbstractRecordWriter? > > >> > > >> What are the mechanics it follows, what should I do and so on? It will > > very > > >> helpful. > > >> > > >> Best Regards, > > >> > > >> Nicolas A Perez > > >> -- > > >> > > >> > > > -------------------------------------------------------------------------------------------- > > >> Sent by Nicolas A Perez from my GMAIL account. > > >> > > >> > > > -------------------------------------------------------------------------------------------- > > >> > > > > > > > > > > > > -- > > > > > > -------------------------------------------------------------------------------------------- > > > Sent by Nicolas A Perez from my GMAIL account. > > > > > > -------------------------------------------------------------------------------------------- > > > > -- > Nicolas A Perez from GMAIL MOBILE -- Nicolas A Perez from GMAIL MOBILE
