Hi All, I hope, I didn't kept you waiting on this for long.
Downloaded: http://www.tpc.org/tpch/spec/tpch_2_16_0.zip Setup VC++ project, as mentioned in last before Tue standup. It is not required that I found later. It is for generating the data and sqls. But the above ZIP already have data and sqls bundled. That VC++ project didn't compile well on my machine, anyway. 1. I created the pojo's for the schema (page 13) defined in tpch.2.16.0.pdf, as a separate simple java project. Then, used super-csv and gson libraries to convert the data files in psv (pipe separated value), into JSON files. Now, data is in json and schema as pojo's. 2. In query-parser project, in DrqlParserTest.java, I added on test method, testTPCHSql1() to run the query parser for Sql1 (out of 20+5 sqls from TPC-H). Copied the entire method to the end of this e-mail for your reference. Along with info gather debugging and console output. The test method printed an error on the console along with output that I am printing. Is error valid ? does it needs to be fixed ? should I create another JIRA for it ? Is the output good enough result of parsing a SQL ? Should I go ahead with rest of the SQLs ? [Sree Vaddi:] Seems, I should be using 'sqlparser' project. Any sample/thought ? 3. How to apply the parsed sql from 2. above to the data in 1. above, to output the Logical Plan ? Please advise. Thanking you. With Regards Sree Supporting code for 2. above and debug info: @Test public void testTPCHSql1() { String drqlQueryText = "select " + "l_returnflag, l_linestatus, " + "sum(l_quantity) as sum_qty, " + "sum(l_extendedprice) as sum_base_price, " + "sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, " + "sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, " + "avg(l_quantity) as avg_qty, " + "avg(l_extendedprice) as avg_price, " + "avg(l_discount) as avg_disc, " + "count(*) as count_order " + "from " + "lineitem " + "where " + "l_shipdate <= date '1998-12-01' - interval ':1' day (3) " + "group by " + "l_returnflag, " + "l_linestatus " + "order by " + "l_returnflag, " + "l_linestatus;"; DrqlParser parser = new AntlrParser(); SemanticModelReader query = parser.parse(drqlQueryText); System.out.println(query.getFromClause()); System.out.println(query.getGroupByClause()); System.out.println(query.getJoinOnClause()); System.out.println(query.getjustATable()); System.out.println(query.getLimitClause()); System.out.println(query.getOrderByClause()); System.out.println(query.getResultColumnList().size()); System.out.println(query.getWhereClause()); /* setup debug info: line#2299 DrqlAntlrParser 2320 3682 4884 5363 392 6664 #1207 DrqlAntlrLexer.mDiv() part of the sql parsing: // l_shipdate <= date '1998-12-01' - interval ':1' day (3) variable value: (parsing location in sql i.e the location of letter 'd' in date) [@125,378:379='<=',<52>,1:378] looks like the 'date' is interpreted as 'div' ?! test method console output: line 1:382 mismatched character 'A' expecting ' ' line 1:416 mismatched character 'A' expecting ' ' [org.apache.drill.parsers.impl.drqlantlr.SemanticModel@3a86edfe] [] null null null [] 10 null */ } ________________________________ From: Sree Vaddi (JIRA) <[email protected]> To: [email protected] Sent: Thursday, July 25, 2013 7:07 AM Subject: [jira] [Work started] (DRILL-47) Generate Logical Plans for TPC-H Queries [ https://issues.apache.org/jira/browse/DRILL-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on DRILL-47 started by Sree Vaddi. > Generate Logical Plans for TPC-H Queries > ---------------------------------------- > > Key: DRILL-47 > URL: https://issues.apache.org/jira/browse/DRILL-47 > Project: Apache Drill > Issue Type: New Feature > Reporter: Jacques Nadeau > Assignee: Sree Vaddi > > Creating example logical plans should help in many ways. Among those are > validation cases for the sql parser, logical plan completeness, execution > engine performance, etc. It would be great if someone could generate logical > plans for each of the TPC-H queries. > The data is in PSV (pipe separated value) files. > Converting these to JSON files. > There are 20 sql files and 5 variant sql files. > Creating one sub-task jira for each of the sql files. Everything related to > that sql will be in it, i.e. sql parser, logical plan, execution stats ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
