[jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really

Harshvardhan Gupta (JIRA) Tue, 28 Feb 2017 09:44:13 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888536#comment-15888536
 ]


Harshvardhan Gupta commented on DERBY-6921:
-------------------------------------------

Hi Mr Bryan,

I am interested in this project and would like to contribute. As a starting 
point, I am going through the VLDB paper and the Derby project and would follow 
up with you soon. 

A little background about myselves: I am a 4th year Computer Science 
undergraduate student at Birla Institute of Technology & Sciences, Pilani, 
India. Relevant experience for this project includes two of my former and 
current internships-

1) Last summer I worked with the co-inventor of Apache Hive at Qubole Inc. to 
build a SQL autocomplete engine in JS on top of Qubole's managed Hive offering.

2) I am currently in middle of a semester long internship at Amazon where I 
analyzed the Query plans of Aurora and Redshift queries in order to tweak them 
for performance. 

I am interested in database and search technologies in general and would love 
to get a chance to contribute to this project through GSoC 2017.

> How good is the Derby Query Optimizer, really
> ---------------------------------------------
>
>                 Key: DERBY-6921
>                 URL: https://issues.apache.org/jira/browse/DERBY-6921
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Bryan Pendleton
>            Priority: Minor
>              Labels: database, gsoc2017, java, optimizer
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
> Technical University introduced a new benchmark suite for evaluating
> database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> The benchmark test suite is publically available:
> http://db.in.tum.de/people/sites/leis/qo/job.tgz
> The data set for running the benchmark is publically available:
> ftp://ftp.fu-berlin.de/pub/misc/movies/database/
> As part of Google Summer of Code 2017, I am volunteering to mentor
> a Summer of Code intern who is interested in using these tools to
> improve the Derby query optimizer.
> My suggestion for the overall process is this:
> 1) Acquire the benchmark tools, and the data set
> 2) Run the benchmark.
> 2a) Some of the benchmark queries may reveal bugs in Derby.
>      For each such bug, we need to isolate the bug and fix it.
> 3) Once we are able to run the entire benchmark, we need to
>    analyze the results.
> 3a) Some of the benchmark queries may reveal opportunities
>    for Derby to improve the query plans that it chooses for
>    various classes of queries (this is explained in detail in the
>    VLDB paper and other information available at Dr. Leis's site)
>    For each such improvement, we need to isolate the issue,
>    report it as a separable improvement, and fix it (if we can)
> While the benchmark is an interesting exercise in and of itself,
> the overall goal of the project is to find-and-fix problems in the
> Derby query optimizer, specifically in the 3 areas which are
> the focus of the benchmark tool:
> 1) How good is the Derby cardinality estimator and when does
>    it lead to slow queries?
> 2) How good it the Derby cost model, and how well is it guiding
>    the overall query optimization process?
> 3) How large is the Derby enumerated plan space, and is it
>    appropriately-sized?
> While other Derby issues have been filed against these questions
> in the past, the intent of this specific project is to use the concrete
> tools provided by the VLDB paper to make this effort rigorous and
> successful at making concrete improvements to the Derby query
> optimizer.
> If you are interested in pursuing this project, please take these
> considerations into mind:
> 1) This is NOT an introductory project. You must be quite familiar
>    with DBMS systems, and with SQL, and in particular with
>    cost-based query optimization. If terms such as "cardinality
>    estimation", "correlated query predicates", or "bushy trees"
>    aren't comfortable terms for you ,this probably isn't the
>    project you're interested in.
> 2) If you are new to Derby, that is fine, but please take advantage
>    of the extensive body of introductory material on Derby to
>    become familiar with it: read the Derby Getting Started manual,
>    download the software and follow the tutorials, read the documentation,
>    download the source code and learn how to build and run the
>    test suites, etc.
> 3) All I have presented here is an **outline** of the project. You will
>    need to read the paper(s), study the benchmark queries, and
>    propose a detailed plan for how to use this benchmark as a tool
>    for improving the Derby query optimizer.
> If these sorts of tasks sound like exciting things to do, then please
> let us know!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DERBY-6921) How good is the Derby Query Optimizer, really

Reply via email to