Hi Charles,
Excellent question. The short answer is "no", but the longer answer is "we
should fix this so the answer is yes."
The reason the answer is "no" today is that Drill uses some odd magic to match
up scan batch creators with plugins. In particular, the class of the second
argument to the scan batch creator constructor is used to match the creator to
the serialized SubScan object. Doing so saved having to do any configuration:
Drill just looks at the code to figure it out. The drawback is that there must
be a separate scan batch creator for each sub scan.
One solution is to do what the Easy framework does: have a single SubScan for
all formats. Works OK for Easy, not so well for storage plugins.
A better solution is to make the association explicit through some form of API,
configuration, etc. Presto, for example, has a set of interfaces that create
the objects required for a connector. No magic; just implement a method.
At your prompting, I'll go back and look at the "Base" framework to see if we
can apply some of these ideas to Drill. For example, we could replace the scan
batch creator with a method call that says, "here is your Sub Scan. Give me
back a Scan operator." Since most of the scan setup is generic, we could
standardize this with another call that says, "here is your sub scan. Give me
back an iterator over readers."
Thanks,
- Paul
On Friday, January 17, 2020, 12:07:20 PM PST, Charles Givre
<[email protected]> wrote:
Hey Paul,
In looking through the storage plugins, it seems as if the scan batch creator
is virtually identical EXCEPT for arguments passed to the RecordReader class.
I'm wondering if that could be abstracted in the Base Storage PR as well.
-- C