[DISCUSS] Format Plugin Interface

Charles Givre Sun, 26 Jan 2020 08:42:25 -0800

Hello all
I wanted to share something that I’m working on and ask for feedback.  I 
started working on converting the LTSV format plugin to EVF and basically was 
able to do that pretty quickly.  This is a relatively simple format in that it 
has one data type and no complex fields.


Instead of just doing the conversion I wanted to see if we could put some more 
abstraction on the format plugin architecture that would make it easier for 
people to build format plugins without having to learn the various Drill 
internals.  I’m still working on the coding and will share once it is more 
presentable. Basically I realized that every format plugin is at a high level 
the same.  
It has to 
1.  Open the input source 
2.  Read that data in
3.  Parse that data into rows
4.  Parse the rows into fields
5. Map the fields into Drill structures
6. Stop when it runs out of data. 

Steps 1 and 2 are virtually identical for every format plugin and hence that 
was the low hanging fruit 🍎. Steps 3-5 sounded like an iterator to me and step 
6 again was something that could be hidden.  

So what I did was write an abstract class called EasyEVFReader which abstracts 
virtually all of the file operations.  It also includes utility functions for 
schema definition (more on that later) and column mapping.  Basically all the 
developer has to do is
1. Create an iterator class that reads the data and maps it to the rows 
2. Extend the EasyEVFReader class and assign the iterator to a variable.  

I’ll share the code tonight or tomorrow but I wanted to ask what people think 
about the general approach.  My goal was to get rid of the cut/paste code that 
exists in so many plugins and greatly simplify the process. 
Thanks!

Sent from my iPhone

[DISCUSS] Format Plugin Interface

Reply via email to