Thanks for all the suggestions everyone! It looks like table functions are the closest to what I am looking for.

- Ryan


On 2018-04-08 02:25 AM, Paul Rogers wrote:
Hi Ryan,

There is an obscure, but very handy feature of Drill called table functions. 
[1] These allow you to set parameters of your format plugin as part of a query.

You mentioned a storage plugin. I've not tried a table function with a storage 
plugin. I have tested table functions with a format plugin.

Your format or storage plugin has a Jackson-serializable Java class. Normally 
you set the properties for your plugin in the Drill web console. But, these can 
also be set in the table function.

I had a use case something like yours. I defined an example "regex" plugin 
where the user can specify a regular expression to apply to to a text file to parse 
columns. The use can then provide a list of column names. Using the table function, I 
could specify the regex and column names per-query.

This exercise did, however, point out two current limitations of table 
functions. First, they work only with simple data types (strings, ints). 
(DRILL-6169) So, my list of columns has to be a single string with a comma 
delimited list of columns. I could not use the more natural list of strings. 
Second, table functions do not retain the configured value of parameters: you 
have to include all parameters in the function, not just the ones you want to 
change. (DRILL-6168)

Yet another option is to set a session option. However, unless you do a bit of 
clever coding, format plugins don't have visibility to session options 
(DRILL-5181).

Perhaps your use case provides a compelling reason to fix some of these 
limitations...

Thanks,

- Paul

[1] 
https://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters,
 see the section "Using the Formats Attributes as Table Function Parameters".


     On Saturday, April 7, 2018, 10:37:05 PM PDT, Aman Sinha 
<amansi...@apache.org> wrote:
A better option would be to have a user-defined function that takes 2
parameters and evaluates to a boolean value.
  e.g  select * from myTable where MyUDF(notColumn, 'value')  IS TRUE;

The Storage Plugin that you are developing would need to implement a
pushdown rule that  looks
at the filter condition and if it contains 'MyUDF()', it would pushdown to
the scan/reader corresponding to your plugin.


On Sat, Apr 7, 2018 at 6:58 PM, Hanumath Rao Maduri <hanu....@gmail.com>
wrote:

Hello Ryan,

Thank you for trying out Drill. Drill/Calcite expects "notColumn" to be
supplied by the underlying scan.
However, I expect that this column will be present in the scan but not past
the filter (notColumn = 'value') in the plan.
In that case you may need to pushdown the filter to the groupScan and then
remove the column projections from your custom groupscan.

It would be easy for us to guess what could be the issue, if you can post
the logical and physical query plan's for this query.

Hope this helps. Please do let us know if you have any further issues.

Thanks,


On Sat, Apr 7, 2018 at 2:08 PM, Ryan Shanks <ryan.sha...@softcomcorp.com>
wrote:

Hi Drill Dev Team!

I am writing a custom storage plugin and I am curious if it is possible
in
Drill to pass a filter value, in the form of a where clause, that is not
related to a column. What I would like to accomplish is something like:

select * from myTable where notColumn = 'value';

In the example, notColumn is not a column in myTable, or any other table,
it is just a specific parameter that the storage plugin will use in the
filtering process. Additionally, notColumn would not be returned as a
column so Drill needs to not expect it as a part of the 'select *'. I
created a rule that will push down and remove these non-column filter
calls, but I need to somehow tell drill/calcite that the filter name is
valid, without actually registering it as a column. The following error
occurs prior to submitting any rules:

org.apache.drill.common.exceptions.UserRemoteException: VALIDATION
ERROR:
 From line 1, column 35 to line 1, column 39: Column 'notColumn' not found
in any table


Alternatively, can I manipulate star queries to only return a subset of
all the columns for a table?

Any insight would be greatly appreciated!

Thanks,
Ryan


Reply via email to