To be clear, it should be possible to use a table function with all of the options -- EXTENDS clause, OVER clause, AS with alias and column aliases, TABLESAMPLE.
I'm surprised that the parser didn't need more lookahead to choose between 't (x, y)' and 't (x INTEGER, y DATE)'. On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <[email protected]> wrote: > In the patch I just sent, probably not. > I will adjust it and add the corresponding test. > > On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <[email protected]> wrote: > >> Can you use both together? Say >> >> select columns >> from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND >> (foo INTEGER) >> >> Julian >> >> >> >> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <[email protected]> wrote: >> > >> > I took a stab at adding the TableFunction syntax without table(...) in >> > Calcite. >> > I have verified that both the table function and extend (with or without >> > keyword) work >> > >> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34 >> > >> > These work: >> > >> > select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter >> => >> > '|') >> > >> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT', >> > fieldDelimiter => '|')) >> > >> > select columns from table(dfs.`/path/to/myfile`('JSON')) >> > >> > select columns from dfs.`/path/to/myfile`('JSON') >> > >> > select columns from dfs.`/path/to/myfile`(type => 'JSON') >> > >> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <[email protected]> >> wrote: >> > >> >> Drill does implicitly what Phoenix does explicitly so I don't think we >> >> should constrain ourselves to having a union of the two syntaxes. >> >> >> >> >> >> That being said, I think we could make these work together... maybe. >> >> >> >> Remove the EXTENDS without keyword syntax from the grammar. >> >> >> >> Create a new sub block in the table block that requires no keyword. >> There >> >> would be two paths (and would probably require some lookahead) >> >> >> >> option 1> unnamed parameters (1,2,3) >> >> option 2> named parameters (a => 1, b=>2, c=> 3) >> >> option 3> create table field pattern (favoriteBand VARCHAR(100), >> >> golfHandicap INTEGER) >> >> >> >> Then we create a table function with options 1 & 2, an EXTENDS clause >> for >> >> option 3. >> >> >> >> Best of both worlds? >> >> >> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <[email protected]> >> >> wrote: >> >> >> >>> Phoenix already supports columns at read-time using the syntax without >> >> the >> >>> EXTENDS keyword as Julian indicated: >> >>> SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER) >> >>> WHERE goldHandicap < 10; >> >>> >> >>> Changing this by requiring the EXTENDS keyword would create a backward >> >>> compatibility problem. >> >>> >> >>> I think it'd be good if both of these extensions worked in Drill & >> >> Phoenix >> >>> given our Drillix initiative. >> >>> >> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <[email protected]> >> >> wrote: >> >>> >> >>>> My proposal was an a or b using the freemarker template in the >> grammar, >> >>>> not something later. >> >>>> >> >>>> Actually, put another way: we may want to consider stating that we >> only >> >>>> incorporate SQL standards in our primary grammar. Any extensions >> should >> >>> be >> >>>> optional grammar. We could simply have grammar plugins in Calcite (the >> >>> same >> >>>> way we plug in external things in Drill). >> >>>> >> >>>> Trying to get every project to agree on extensions seems like it may >> be >> >>>> hard. >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Jacques Nadeau >> >>>> CTO and Co-Founder, Dremio >> >>>> >> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <[email protected]> wrote: >> >>>> >> >>>>> I can see why Jacques wants this syntax. >> >>>>> >> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to be >> >>>>> predictable. Any variation should happen at validation time, or >> later. >> >>>>> >> >>>>> Also, we shouldn’t add configuration parameters as a way of avoiding >> a >> >>>>> tough design discussion. >> >>>>> >> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and >> >> they >> >>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix >> >> (by >> >>>>> which I mean Jacques and James, I guess) need to agree what the SQL >> >>> syntax >> >>>>> should be. >> >>>>> >> >>>>> Julian >> >>>>> >> >>>>> >> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <[email protected]> wrote: >> >>>>>> >> >>>>>> Looking at those two examples I agree with Jacques. The first >> >> appears >> >>>>> more >> >>>>>> like a hint from the syntactic sugar point of view. >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <[email protected] >> >>> >> >>>>> wrote: >> >>>>>> >> >>>>>>> Since EXTEND is custom functionality, it seems reasonable that we >> >>> could >> >>>>>>> have a switch. Given that SQL Server and Postgres support it seems >> >>>>>>> reasonable to support the table functions without the TABLE syntax. >> >>>>>>> >> >>>>>>> I for one definitely think the TABLE syntax is much more confusing >> >> to >> >>>>> use, >> >>>>>>> especially in the example that we're looking to support, such as: >> >>>>>>> >> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV', >> >> fieldDelimiter >> >>> => >> >>>>>>> '|', skipFirstRow => true) >> >>>>>>> >> >>>>>>> This seems much clearer than: >> >>>>>>> >> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', >> >>>>> fieldDelimiter >> >>>>>>> => '|', skipFirstRow => true)) >> >>>>>>> >> >>>>>>> It also looks much more like a hint to the table (which is our >> >> goal). >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Jacques Nadeau >> >>>>>>> CTO and Co-Founder, Dremio >> >>>>>>> >> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <[email protected]> >> >>> wrote: >> >>>>>>> >> >>>>>>>> Thanks for doing the legwork and finding what the other vendors >> >> do. >> >>>>> It is >> >>>>>>>> indeed compelling that SQL Server and Postgres go beyond the >> >>> standard >> >>>>> an >> >>>>>>>> make the TABLE keyword optional. >> >>>>>>>> >> >>>>>>>> I tried that syntax in Calcite and discovered that there is a >> >> clash >> >>>>> with >> >>>>>>>> one of our own (few) extensions. In >> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the >> >>>>> EXTENDS >> >>>>>>>> clause. You can write >> >>>>>>>> >> >>>>>>>> SELECT * >> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER) >> >>>>>>>> WHERE goldHandicap < 10; >> >>>>>>>> >> >>>>>>>> to tell Calcite that there are two undeclared columns in the Emp >> >>> table >> >>>>>>> but >> >>>>>>>> you would like to use them in this particular query. We chose to >> >>> make >> >>>>> the >> >>>>>>>> EXTEND keyword optional, so you could instead write >> >>>>>>>> >> >>>>>>>> SELECT * >> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER) >> >>>>>>>> WHERE goldHandicap < 10; >> >>>>>>>> >> >>>>>>>> That is uncomfortably close to >> >>>>>>>> >> >>>>>>>> SELECT * >> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap); >> >>>>>>>> >> >>>>>>>> so we would require >> >>>>>>>> >> >>>>>>>> SELECT * >> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)); >> >>>>>>>> >> >>>>>>>> if EmpFunction was a table-function. You could combine the two >> >> forms >> >>>>> like >> >>>>>>>> this: >> >>>>>>>> >> >>>>>>>> SELECT * >> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND >> >>>>>>>> (anotherAttribute INTEGER); >> >>>>>>>> >> >>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we >> >>> should >> >>>>>>> also >> >>>>>>>> ask whether requiring folks to type TABLE is such a hardship. >> >>>>>>>> >> >>>>>>>> Julian >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <[email protected]> >> >>> wrote: >> >>>>>>>>> >> >>>>>>>>> - Table function syntax: I did a quick search and it seems >> >> there's >> >>> no >> >>>>>>>>> consensus about this. >> >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling >> >>> table >> >>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2 >> >>> [4] >> >>>>>>>>> expect it. >> >>>>>>>>> MySQL does not have table functions [5] >> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :) >> >>>>>>>>> Would it be reasonable to allow a switch in the grammar >> >> generation >> >>> to >> >>>>>>>> have >> >>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL >> >>> like >> >>>>>>>>> syntax (back ticks for identifiers etc) >> >>>>>>>>> >> >>>>>>>>> [1] >> >>>>>>> >> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html >> >>>>>>>>> [2] >> >>>>>>> >> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx >> >>>>>>>>> [3] >> >>> https://oracle-base.com/articles/misc/pipelined-table-functions >> >>>>>>>>> [4] >> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/ >> >>>>>>>>> [5] >> >>>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> >> >> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table >> >>>>>>>>> >> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function >> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files >> >>>>>>>>> But that seems too easy to be true. Possibly this method is >> >> called >> >>>>> more >> >>>>>>>>> than once (before and after the function has been resolved?) >> >>>>>>>>> >> >>>>>>>>> FYI this would happen only when using named parameter. We do want >> >>> to >> >>>>>>>>> overload in this case, which is why I'm looking into it. >> >>>>>>>>> >> >>>>>>>>> I'll fill a JIRA for my other branch >> >>>>>>>>> >> >>>>>>>>> Julien >> >>>>>>>>> >> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <[email protected]> >> >>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <[email protected]> >> >>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in >> >> Calcite >> >>>>> when >> >>>>>>>>>> there's more than 1 function with the same name. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Yes; see below. >> >>>>>>>>>> >> >>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now >> >>> just >> >>>>>>>> being >> >>>>>>>>>> able to specify the delimiter for csv files. >> >>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are >> >>> the >> >>>>>>> way >> >>>>>>>> to >> >>>>>>>>>> go. >> >>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax >> >>>>>>>>>> necessary?* >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Consider: >> >>>>>>>>>> >> >>>>>>>>>> select * from myTable as f(x, y) >> >>>>>>>>>> select * from myTable f(x, y) >> >>>>>>>>>> select * from myFunction(x, y) >> >>>>>>>>>> >> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. >> >>> Also, >> >>>>>>> if >> >>>>>>>> f >> >>>>>>>>>> is a function with zero arguments, could you invoke it like >> >> this?: >> >>>>>>>>>> >> >>>>>>>>>> select * from f >> >>>>>>>>>> >> >>>>>>>>>> I don’t know the actual rationale. But I know that the SQL >> >>> standards >> >>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate. >> >>>>>>>>>> >> >>>>>>>>>> I had to fix some things in Calcite to enable this: >> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files >> >>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used >> >>> in >> >>>>>>>>>> Calcite for the Maze example. >> >>>>>>>>>> Which is why some hooks were missing. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Can you log a jira case to track this bug? >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it. >> >>>>>>>>>> Here is a test that reproduces the problem: >> >>>>>>>>>> https://github.com/apache/calcite/pull/166 >> >>>>>>>>>> If we return more than 1 TableFunction with the same name, we >> >> get >> >>> a >> >>>>>>> NPE >> >>>>>>>>>> later on. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a >> >>> JIRA >> >>>>>>> case >> >>>>>>>>>> on resolution of overloaded functions when invoked with named >> >>>>>>> arguments. >> >>>>>>>>>> (It probably applies to all functions, not just table >> >> functions.) >> >>>>> The >> >>>>>>>> fix >> >>>>>>>>>> will take a while (if you wait for me to write it). >> >>>>>>>>>> >> >>>>>>>>>> For now please tell your users not to overload. :) >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Julian >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> Julien >> >>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> *Jim Scott* >> >>>>>> Director, Enterprise Strategy & Architecture >> >>>>>> +1 (347) 746-9281 >> >>>>>> @kingmesal <https://twitter.com/kingmesal> >> >>>>>> >> >>>>>> <http://www.mapr.com/> >> >>>>>> [image: MapR Technologies] <http://www.mapr.com> >> >>>>>> >> >>>>>> Now Available - Free Hadoop On-Demand Training >> >>>>>> < >> >>>>> >> >>> >> >> >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >> >>>>>> >> >>>>> >> >>>>> >> >>>> >> >>> >> >> >> > >> > >> > >> > -- >> > Julien >> >> > > > -- > Julien
