Can you use both together? Say select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND (foo INTEGER)
Julian > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <[email protected]> wrote: > > I took a stab at adding the TableFunction syntax without table(...) in > Calcite. > I have verified that both the table function and extend (with or without > keyword) work > https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34 > > These work: > > select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => > '|') > > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT', > fieldDelimiter => '|')) > > select columns from table(dfs.`/path/to/myfile`('JSON')) > > select columns from dfs.`/path/to/myfile`('JSON') > > select columns from dfs.`/path/to/myfile`(type => 'JSON') > > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <[email protected]> wrote: > >> Drill does implicitly what Phoenix does explicitly so I don't think we >> should constrain ourselves to having a union of the two syntaxes. >> >> >> That being said, I think we could make these work together... maybe. >> >> Remove the EXTENDS without keyword syntax from the grammar. >> >> Create a new sub block in the table block that requires no keyword. There >> would be two paths (and would probably require some lookahead) >> >> option 1> unnamed parameters (1,2,3) >> option 2> named parameters (a => 1, b=>2, c=> 3) >> option 3> create table field pattern (favoriteBand VARCHAR(100), >> golfHandicap INTEGER) >> >> Then we create a table function with options 1 & 2, an EXTENDS clause for >> option 3. >> >> Best of both worlds? >> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <[email protected]> >> wrote: >> >>> Phoenix already supports columns at read-time using the syntax without >> the >>> EXTENDS keyword as Julian indicated: >>> SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER) >>> WHERE goldHandicap < 10; >>> >>> Changing this by requiring the EXTENDS keyword would create a backward >>> compatibility problem. >>> >>> I think it'd be good if both of these extensions worked in Drill & >> Phoenix >>> given our Drillix initiative. >>> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <[email protected]> >> wrote: >>> >>>> My proposal was an a or b using the freemarker template in the grammar, >>>> not something later. >>>> >>>> Actually, put another way: we may want to consider stating that we only >>>> incorporate SQL standards in our primary grammar. Any extensions should >>> be >>>> optional grammar. We could simply have grammar plugins in Calcite (the >>> same >>>> way we plug in external things in Drill). >>>> >>>> Trying to get every project to agree on extensions seems like it may be >>>> hard. >>>> >>>> >>>> >>>> -- >>>> Jacques Nadeau >>>> CTO and Co-Founder, Dremio >>>> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <[email protected]> wrote: >>>> >>>>> I can see why Jacques wants this syntax. >>>>> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to be >>>>> predictable. Any variation should happen at validation time, or later. >>>>> >>>>> Also, we shouldn’t add configuration parameters as a way of avoiding a >>>>> tough design discussion. >>>>> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and >> they >>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix >> (by >>>>> which I mean Jacques and James, I guess) need to agree what the SQL >>> syntax >>>>> should be. >>>>> >>>>> Julian >>>>> >>>>> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <[email protected]> wrote: >>>>>> >>>>>> Looking at those two examples I agree with Jacques. The first >> appears >>>>> more >>>>>> like a hint from the syntactic sugar point of view. >>>>>> >>>>>> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <[email protected] >>> >>>>> wrote: >>>>>> >>>>>>> Since EXTEND is custom functionality, it seems reasonable that we >>> could >>>>>>> have a switch. Given that SQL Server and Postgres support it seems >>>>>>> reasonable to support the table functions without the TABLE syntax. >>>>>>> >>>>>>> I for one definitely think the TABLE syntax is much more confusing >> to >>>>> use, >>>>>>> especially in the example that we're looking to support, such as: >>>>>>> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV', >> fieldDelimiter >>> => >>>>>>> '|', skipFirstRow => true) >>>>>>> >>>>>>> This seems much clearer than: >>>>>>> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', >>>>> fieldDelimiter >>>>>>> => '|', skipFirstRow => true)) >>>>>>> >>>>>>> It also looks much more like a hint to the table (which is our >> goal). >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jacques Nadeau >>>>>>> CTO and Co-Founder, Dremio >>>>>>> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <[email protected]> >>> wrote: >>>>>>> >>>>>>>> Thanks for doing the legwork and finding what the other vendors >> do. >>>>> It is >>>>>>>> indeed compelling that SQL Server and Postgres go beyond the >>> standard >>>>> an >>>>>>>> make the TABLE keyword optional. >>>>>>>> >>>>>>>> I tried that syntax in Calcite and discovered that there is a >> clash >>>>> with >>>>>>>> one of our own (few) extensions. In >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the >>>>> EXTENDS >>>>>>>> clause. You can write >>>>>>>> >>>>>>>> SELECT * >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER) >>>>>>>> WHERE goldHandicap < 10; >>>>>>>> >>>>>>>> to tell Calcite that there are two undeclared columns in the Emp >>> table >>>>>>> but >>>>>>>> you would like to use them in this particular query. We chose to >>> make >>>>> the >>>>>>>> EXTEND keyword optional, so you could instead write >>>>>>>> >>>>>>>> SELECT * >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER) >>>>>>>> WHERE goldHandicap < 10; >>>>>>>> >>>>>>>> That is uncomfortably close to >>>>>>>> >>>>>>>> SELECT * >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap); >>>>>>>> >>>>>>>> so we would require >>>>>>>> >>>>>>>> SELECT * >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)); >>>>>>>> >>>>>>>> if EmpFunction was a table-function. You could combine the two >> forms >>>>> like >>>>>>>> this: >>>>>>>> >>>>>>>> SELECT * >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND >>>>>>>> (anotherAttribute INTEGER); >>>>>>>> >>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we >>> should >>>>>>> also >>>>>>>> ask whether requiring folks to type TABLE is such a hardship. >>>>>>>> >>>>>>>> Julian >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <[email protected]> >>> wrote: >>>>>>>>> >>>>>>>>> - Table function syntax: I did a quick search and it seems >> there's >>> no >>>>>>>>> consensus about this. >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling >>> table >>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2 >>> [4] >>>>>>>>> expect it. >>>>>>>>> MySQL does not have table functions [5] >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :) >>>>>>>>> Would it be reasonable to allow a switch in the grammar >> generation >>> to >>>>>>>> have >>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL >>> like >>>>>>>>> syntax (back ticks for identifiers etc) >>>>>>>>> >>>>>>>>> [1] >>>>>>> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html >>>>>>>>> [2] >>>>>>> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx >>>>>>>>> [3] >>> https://oracle-base.com/articles/misc/pipelined-table-functions >>>>>>>>> [4] >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/ >>>>>>>>> [5] >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> >> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table >>>>>>>>> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files >>>>>>>>> But that seems too easy to be true. Possibly this method is >> called >>>>> more >>>>>>>>> than once (before and after the function has been resolved?) >>>>>>>>> >>>>>>>>> FYI this would happen only when using named parameter. We do want >>> to >>>>>>>>> overload in this case, which is why I'm looking into it. >>>>>>>>> >>>>>>>>> I'll fill a JIRA for my other branch >>>>>>>>> >>>>>>>>> Julien >>>>>>>>> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <[email protected]> >>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <[email protected]> >>>>> wrote: >>>>>>>>>> >>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in >> Calcite >>>>> when >>>>>>>>>> there's more than 1 function with the same name. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes; see below. >>>>>>>>>> >>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now >>> just >>>>>>>> being >>>>>>>>>> able to specify the delimiter for csv files. >>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are >>> the >>>>>>> way >>>>>>>> to >>>>>>>>>> go. >>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax >>>>>>>>>> necessary?* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Consider: >>>>>>>>>> >>>>>>>>>> select * from myTable as f(x, y) >>>>>>>>>> select * from myTable f(x, y) >>>>>>>>>> select * from myFunction(x, y) >>>>>>>>>> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. >>> Also, >>>>>>> if >>>>>>>> f >>>>>>>>>> is a function with zero arguments, could you invoke it like >> this?: >>>>>>>>>> >>>>>>>>>> select * from f >>>>>>>>>> >>>>>>>>>> I don’t know the actual rationale. But I know that the SQL >>> standards >>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate. >>>>>>>>>> >>>>>>>>>> I had to fix some things in Calcite to enable this: >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files >>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used >>> in >>>>>>>>>> Calcite for the Maze example. >>>>>>>>>> Which is why some hooks were missing. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Can you log a jira case to track this bug? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it. >>>>>>>>>> Here is a test that reproduces the problem: >>>>>>>>>> https://github.com/apache/calcite/pull/166 >>>>>>>>>> If we return more than 1 TableFunction with the same name, we >> get >>> a >>>>>>> NPE >>>>>>>>>> later on. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a >>> JIRA >>>>>>> case >>>>>>>>>> on resolution of overloaded functions when invoked with named >>>>>>> arguments. >>>>>>>>>> (It probably applies to all functions, not just table >> functions.) >>>>> The >>>>>>>> fix >>>>>>>>>> will take a while (if you wait for me to write it). >>>>>>>>>> >>>>>>>>>> For now please tell your users not to overload. :) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Julian >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Julien >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Jim Scott* >>>>>> Director, Enterprise Strategy & Architecture >>>>>> +1 (347) 746-9281 >>>>>> @kingmesal <https://twitter.com/kingmesal> >>>>>> >>>>>> <http://www.mapr.com/> >>>>>> [image: MapR Technologies] <http://www.mapr.com> >>>>>> >>>>>> Now Available - Free Hadoop On-Demand Training >>>>>> < >>>>> >>> >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >>>>>> >>>>> >>>>> >>>> >>> >> > > > > -- > Julien
