+1 to use table functions

In Calcite (and I presume Drill) a “table function” may actually function more 
like a (Lisp) macro. The function gets called at prepare time to yield a 
RelNode (say a TableScan). So a table function is every bit as efficient as 
using a table, but it allows extra parameters.

If the table function has a lot of parameters it might be nice to support named 
parameters:

select * from table(disitributedFile(path => ‘/path/to/something.psv’, 
delimiter => ‘|’));
 
Named parameters are in the SQL standard but are not supported by Calcite’s 
parser currently. Parameters can be specified in any order, and those not 
specified have a default value.

Julian


> On Oct 19, 2015, at 5:18 PM, Ted Dunning <[email protected]> wrote:
> 
> Wouldn't a table function be a better option?
> 
> Something like this perhaps?
> 
> select * from
> delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')
> 
> ?
> 
> Or how about fake-o parameters that the delimited record scanner knows how
> to push down into the scanning of the data? That would look like this:
> 
> select * from
> dfs.`default`.`/path/to/file/something.psv`
> where magicFieldDelimiter = '|';
> 
> 
> 
> On Mon, Oct 19, 2015 at 2:28 PM, Julien Le Dem <[email protected]> wrote:
> 
>> I'm looking into passing information on how to interpret a file through the
>> select clause in Drill.
>> Something along the lines of:
>> *select * from
>> dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
>> (In this example, we want to specify a specific delimiter, but that would
>> apply to any *type* of format)
>> 
>> Which would allow to read a file without having to centrally configure
>> formats: https://drill.apache.org/docs/querying-plain-text-files/
>> Which makes it easier to try to read an existing file.
>> Typically once the user has found the proper settings, they would update
>> the central configuration.
>> 
>> thoughts?
>> 
>> --
>> Julien
>> 

Reply via email to