Hi J, - The goal was to come up with manually generated tpc-h logical queries. We'll use these to validate the output of sql parser.
I am doing the later. feed the tpc-h queries to sql parser and come up with logical plan, then verify manually or by feeding into execution engine. - DrQL parser is not currently being used. I realized it later. - Why are you creating pojos for anything? TPC-H data set is in PSV files. It is easy with POJOs for this work flow. From PSV -> pojos -> JSON for now and any other format later. At the end, we can give out a data set and sqls, respective logical plan and physical plan, for drill users to play with and refer to. V ________________________________ From: Jacques Nadeau <[email protected]> To: [email protected]; Sree V <[email protected]> Sent: Friday, July 26, 2013 10:59 AM Subject: Re: [jira] [Work started] (DRILL-47) Generate Logical Plans for TPC-H Queries Some thoughts (not in any particular order): - The goal was to come up with manually generated tpc-h logical queries. We'll use these to validate the output of sql parser. - DrQL parser is not currently being used. - Why are you creating pojos for anything? J > [Sree Vaddi:] Seems, I should be using 'sqlparser' project. Any > sample/thought ? > > > 3. > How to apply the parsed sql from 2. above to the data in 1. above, to output > the > Logical Plan ? > > > Please advise. > > > Thanking you. > With Regards > Sree > > > > Supporting code for 2. above and debug info: > > @Test > public void testTPCHSql1() { > String drqlQueryText = "select " + > "l_returnflag, l_linestatus, " + > "sum(l_quantity) as sum_qty, " + > "sum(l_extendedprice) as sum_base_price, " + > "sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, " + > "sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as >sum_charge, " + > "avg(l_quantity) as avg_qty, " + > > "avg(l_extendedprice) as avg_price, " + > "avg(l_discount) as avg_disc, " + > "count(*) as count_order " + > "from " + > "lineitem " + > "where " + > "l_shipdate <= date '1998-12-01' - interval ':1' day (3) " + > "group by " + > "l_returnflag, " + > "l_linestatus " + > "order by " + > "l_returnflag, " + > > "l_linestatus;"; > > DrqlParser parser = new AntlrParser(); > SemanticModelReader query = parser.parse(drqlQueryText); > > System.out.println(query.getFromClause()); > System.out.println(query.getGroupByClause()); > System.out.println(query.getJoinOnClause()); > System.out.println(query.getjustATable()); > System.out.println(query.getLimitClause()); > System.out.println(query.getOrderByClause()); > System.out.println(query.getResultColumnList().size()); > > System.out.println(query.getWhereClause()); > /* > setup debug info: > line#2299 DrqlAntlrParser > 2320 > 3682 > 4884 > 5363 > > 392 > 6664 > > #1207 DrqlAntlrLexer.mDiv() > part of the sql parsing: > // l_shipdate <= date '1998-12-01' - interval ':1' day (3) > variable value: (parsing location in sql i.e the location of letter 'd' in > date) > [@125,378:379='<=',<52>,1:378] > > looks like the 'date' is interpreted as 'div' ?! > > test method console output: > line 1:382 mismatched character 'A' expecting ' ' > line 1:416 mismatched character 'A' expecting ' ' > > [org.apache.drill.parsers.impl.drqlantlr.SemanticModel@3a86edfe] > [] > null > null > null > [] > 10 > null > > */ > } > > ________________________________ > From: Sree Vaddi (JIRA) <[email protected]> > To: [email protected] > Sent: Thursday, July 25, 2013 7:07 AM > Subject: [jira] [Work started] (DRILL-47) Generate Logical Plans for TPC-H > Queries > > > > [ >https://issues.apache.org/jira/browse/DRILL-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Work on DRILL-47 started by Sree Vaddi. > >> Generate Logical Plans for TPC-H Queries >> ---------------------------------------- >> >> Key: DRILL-47 >> URL: https://issues.apache.org/jira/browse/DRILL-47 >> Project: Apache Drill >> Issue Type: New Feature >> Reporter: Jacques Nadeau >> Assignee: Sree Vaddi >> >> Creating example logical plans should help in many ways. Among those are >> validation cases for the sql parser, logical plan completeness, execution >> engine performance, etc. It would be great if someone could generate >> logical plans for each of the TPC-H queries. >> The data is in PSV (pipe separated value) files. >> Converting these to JSON files. >> There are 20 sql files and 5 variant sql files. >> Creating one sub-task jira for each of the sql files. Everything related to >> that sql will be in it, i.e. sql parser, logical plan, execution stats ... > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira
