Hi Ryan,

There is an obscure, but very handy feature of Drill called table functions. 
[1] These allow you to set parameters of your format plugin as part of a query.

You mentioned a storage plugin. I've not tried a table function with a storage 
plugin. I have tested table functions with a format plugin.

Your format or storage plugin has a Jackson-serializable Java class. Normally 
you set the properties for your plugin in the Drill web console. But, these can 
also be set in the table function.

I had a use case something like yours. I defined an example "regex" plugin 
where the user can specify a regular expression to apply to to a text file to 
parse columns. The use can then provide a list of column names. Using the table 
function, I could specify the regex and column names per-query.

This exercise did, however, point out two current limitations of table 
functions. First, they work only with simple data types (strings, ints). 
(DRILL-6169) So, my list of columns has to be a single string with a comma 
delimited list of columns. I could not use the more natural list of strings. 
Second, table functions do not retain the configured value of parameters: you 
have to include all parameters in the function, not just the ones you want to 
change. (DRILL-6168)

Yet another option is to set a session option. However, unless you do a bit of 
clever coding, format plugins don't have visibility to session options 
(DRILL-5181).

Perhaps your use case provides a compelling reason to fix some of these 
limitations...

Thanks,

- Paul

[1] 
https://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters,
 see the section "Using the Formats Attributes as Table Function Parameters".


    On Saturday, April 7, 2018, 10:37:05 PM PDT, Aman Sinha 
<amansi...@apache.org> wrote:  
 
 A better option would be to have a user-defined function that takes 2
parameters and evaluates to a boolean value.
 e.g  select * from myTable where MyUDF(notColumn, 'value')  IS TRUE;

The Storage Plugin that you are developing would need to implement a
pushdown rule that  looks
at the filter condition and if it contains 'MyUDF()', it would pushdown to
the scan/reader corresponding to your plugin.


On Sat, Apr 7, 2018 at 6:58 PM, Hanumath Rao Maduri <hanu....@gmail.com>
wrote:

> Hello Ryan,
>
> Thank you for trying out Drill. Drill/Calcite expects "notColumn" to be
> supplied by the underlying scan.
> However, I expect that this column will be present in the scan but not past
> the filter (notColumn = 'value') in the plan.
> In that case you may need to pushdown the filter to the groupScan and then
> remove the column projections from your custom groupscan.
>
> It would be easy for us to guess what could be the issue, if you can post
> the logical and physical query plan's for this query.
>
> Hope this helps. Please do let us know if you have any further issues.
>
> Thanks,
>
>
> On Sat, Apr 7, 2018 at 2:08 PM, Ryan Shanks <ryan.sha...@softcomcorp.com>
> wrote:
>
> > Hi Drill Dev Team!
> >
> > I am writing a custom storage plugin and I am curious if it is possible
> in
> > Drill to pass a filter value, in the form of a where clause, that is not
> > related to a column. What I would like to accomplish is something like:
> >
> > select * from myTable where notColumn = 'value';
> >
> > In the example, notColumn is not a column in myTable, or any other table,
> > it is just a specific parameter that the storage plugin will use in the
> > filtering process. Additionally, notColumn would not be returned as a
> > column so Drill needs to not expect it as a part of the 'select *'. I
> > created a rule that will push down and remove these non-column filter
> > calls, but I need to somehow tell drill/calcite that the filter name is
> > valid, without actually registering it as a column. The following error
> > occurs prior to submitting any rules:
> >
> > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION
> ERROR:
> > From line 1, column 35 to line 1, column 39: Column 'notColumn' not found
> > in any table
> >
> >
> > Alternatively, can I manipulate star queries to only return a subset of
> > all the columns for a table?
> >
> > Any insight would be greatly appreciated!
> >
> > Thanks,
> > Ryan
> >
>
  

Reply via email to