Hello all I wanted to share something that I’m working on and ask for feedback. I started working on converting the LTSV format plugin to EVF and basically was able to do that pretty quickly. This is a relatively simple format in that it has one data type and no complex fields.
Instead of just doing the conversion I wanted to see if we could put some more abstraction on the format plugin architecture that would make it easier for people to build format plugins without having to learn the various Drill internals. I’m still working on the coding and will share once it is more presentable. Basically I realized that every format plugin is at a high level the same. It has to 1. Open the input source 2. Read that data in 3. Parse that data into rows 4. Parse the rows into fields 5. Map the fields into Drill structures 6. Stop when it runs out of data. Steps 1 and 2 are virtually identical for every format plugin and hence that was the low hanging fruit 🍎. Steps 3-5 sounded like an iterator to me and step 6 again was something that could be hidden. So what I did was write an abstract class called EasyEVFReader which abstracts virtually all of the file operations. It also includes utility functions for schema definition (more on that later) and column mapping. Basically all the developer has to do is 1. Create an iterator class that reads the data and maps it to the rows 2. Extend the EasyEVFReader class and assign the iterator to a variable. I’ll share the code tonight or tomorrow but I wanted to ask what people think about the general approach. My goal was to get rid of the cut/paste code that exists in so many plugins and greatly simplify the process. Thanks! Sent from my iPhone
