+1 to use table functions In Calcite (and I presume Drill) a “table function” may actually function more like a (Lisp) macro. The function gets called at prepare time to yield a RelNode (say a TableScan). So a table function is every bit as efficient as using a table, but it allows extra parameters.
If the table function has a lot of parameters it might be nice to support named parameters: select * from table(disitributedFile(path => ‘/path/to/something.psv’, delimiter => ‘|’)); Named parameters are in the SQL standard but are not supported by Calcite’s parser currently. Parameters can be specified in any order, and those not specified have a default value. Julian > On Oct 19, 2015, at 5:18 PM, Ted Dunning <[email protected]> wrote: > > Wouldn't a table function be a better option? > > Something like this perhaps? > > select * from > delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|') > > ? > > Or how about fake-o parameters that the delimited record scanner knows how > to push down into the scanning of the data? That would look like this: > > select * from > dfs.`default`.`/path/to/file/something.psv` > where magicFieldDelimiter = '|'; > > > > On Mon, Oct 19, 2015 at 2:28 PM, Julien Le Dem <[email protected]> wrote: > >> I'm looking into passing information on how to interpret a file through the >> select clause in Drill. >> Something along the lines of: >> *select * from >> dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;* >> (In this example, we want to specify a specific delimiter, but that would >> apply to any *type* of format) >> >> Which would allow to read a file without having to centrally configure >> formats: https://drill.apache.org/docs/querying-plain-text-files/ >> Which makes it easier to try to read an existing file. >> Typically once the user has found the proper settings, they would update >> the central configuration. >> >> thoughts? >> >> -- >> Julien >>
