Hi All,

I hope, I didn't kept you waiting on this for long.

Downloaded:
http://www.tpc.org/tpch/spec/tpch_2_16_0.zip

Setup VC++ project, as mentioned in last before Tue standup.
It is not required that I found later.  It is for generating the data and sqls.
But the above ZIP already have data and sqls bundled.
That VC++ project didn't compile well on my machine, anyway.

1.
I created the pojo's for the schema (page 13) defined in tpch.2.16.0.pdf,
as a separate simple java project.

Then, used super-csv and gson libraries to convert the data files in psv (pipe 
separated value),
into JSON files.

Now, data is in json and schema as pojo's.


2.

In query-parser project, in DrqlParserTest.java, I added on test method,
testTPCHSql1() to run the query parser for Sql1 (out of 20+5 sqls from TPC-H).

Copied the entire method to the end of this e-mail for your reference.
Along with info gather debugging and console output.


The test method printed an error on the console along with output that I am 
printing.
Is error valid ? does it needs to be fixed ? should I create another JIRA for 
it ?
Is the output good enough result of parsing a SQL ?  Should I go ahead with 
rest of the SQLs ?


[Sree Vaddi:] Seems, I should be using 'sqlparser' project.  Any sample/thought 
?


3.
How to apply the parsed sql from 2. above to the data in 1. above, to output the
Logical Plan ?


Please advise.


Thanking you.
With Regards
Sree



Supporting code for 2. above and debug info:

    @Test
    public void testTPCHSql1() {
        String drqlQueryText = "select " +
            "l_returnflag, l_linestatus, " +
            "sum(l_quantity) as sum_qty, " +
            "sum(l_extendedprice) as sum_base_price, " +
            "sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, " +
            "sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as 
sum_charge, " +
            "avg(l_quantity) as avg_qty, " +
       
     "avg(l_extendedprice) as avg_price, " +
            "avg(l_discount) as avg_disc, " +
            "count(*) as count_order " +
        "from " +
            "lineitem " +
        "where " +
            "l_shipdate <= date '1998-12-01' - interval ':1' day (3) " +
        "group by " +
            "l_returnflag, " +
            "l_linestatus " +
        "order by " +
            "l_returnflag, " +
           
 "l_linestatus;";
        
        DrqlParser parser = new AntlrParser();
        SemanticModelReader query = parser.parse(drqlQueryText);
        
        System.out.println(query.getFromClause());
        System.out.println(query.getGroupByClause());
        System.out.println(query.getJoinOnClause());
        System.out.println(query.getjustATable());
        System.out.println(query.getLimitClause());
        System.out.println(query.getOrderByClause());
        System.out.println(query.getResultColumnList().size());
       
 System.out.println(query.getWhereClause());
        /*
setup debug info:
line#2299 DrqlAntlrParser
2320
3682
4884
5363

392
6664

#1207 DrqlAntlrLexer.mDiv()
part of the sql parsing:
// l_shipdate <= date '1998-12-01' - interval ':1' day (3)
variable value: (parsing location in sql i.e the location of letter 'd' in date)
[@125,378:379='<=',<52>,1:378]

looks like the 'date' is interpreted as 'div' ?!

test method console output:
line 1:382 mismatched character 'A' expecting ' '
line 1:416 mismatched character 'A' expecting ' '

[org.apache.drill.parsers.impl.drqlantlr.SemanticModel@3a86edfe]
[]
null
null
null
[]
10
null

         */
    }

________________________________
 From: Sree Vaddi (JIRA) <[email protected]>
To: [email protected] 
Sent: Thursday, July 25, 2013 7:07 AM
Subject: [jira] [Work started] (DRILL-47) Generate Logical Plans for TPC-H 
Queries
 


     [ 
https://issues.apache.org/jira/browse/DRILL-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on DRILL-47 started by Sree Vaddi.

> Generate Logical Plans for TPC-H Queries
> ----------------------------------------
>
>                 Key: DRILL-47
>                 URL: https://issues.apache.org/jira/browse/DRILL-47
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Jacques Nadeau
>            Assignee: Sree Vaddi
>
> Creating example logical plans should help in many ways.  Among those are 
> validation cases for the sql parser, logical plan completeness, execution 
> engine performance, etc.  It would be great if someone could generate logical 
> plans for each of the TPC-H queries.
> The data is in PSV (pipe separated value) files.
> Converting these to JSON files.
> There are 20 sql files and 5 variant sql files.
> Creating one sub-task jira for each of the sql files.  Everything related to 
> that sql will be in it, i.e. sql parser, logical plan, execution stats ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to