[ https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943436#comment-15943436 ]
Harshvardhan Gupta commented on DERBY-6921: ------------------------------------------- Thanks for reviewing the proposal. I'll make sure to thoroughly go into the above mentioned resources and past GSoC proposals to further refine my proposal. > How good is the Derby Query Optimizer, really > --------------------------------------------- > > Key: DERBY-6921 > URL: https://issues.apache.org/jira/browse/DERBY-6921 > Project: Derby > Issue Type: Improvement > Components: SQL > Reporter: Bryan Pendleton > Priority: Minor > Labels: database, gsoc2017, java, optimizer > Original Estimate: 2,016h > Remaining Estimate: 2,016h > > At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich > Technical University introduced a new benchmark suite for evaluating > database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf > The benchmark test suite is publically available: > http://db.in.tum.de/people/sites/leis/qo/job.tgz > The data set for running the benchmark is publically available: > ftp://ftp.fu-berlin.de/pub/misc/movies/database/ > As part of Google Summer of Code 2017, I am volunteering to mentor > a Summer of Code intern who is interested in using these tools to > improve the Derby query optimizer. > My suggestion for the overall process is this: > 1) Acquire the benchmark tools, and the data set > 2) Run the benchmark. > 2a) Some of the benchmark queries may reveal bugs in Derby. > For each such bug, we need to isolate the bug and fix it. > 3) Once we are able to run the entire benchmark, we need to > analyze the results. > 3a) Some of the benchmark queries may reveal opportunities > for Derby to improve the query plans that it chooses for > various classes of queries (this is explained in detail in the > VLDB paper and other information available at Dr. Leis's site) > For each such improvement, we need to isolate the issue, > report it as a separable improvement, and fix it (if we can) > While the benchmark is an interesting exercise in and of itself, > the overall goal of the project is to find-and-fix problems in the > Derby query optimizer, specifically in the 3 areas which are > the focus of the benchmark tool: > 1) How good is the Derby cardinality estimator and when does > it lead to slow queries? > 2) How good it the Derby cost model, and how well is it guiding > the overall query optimization process? > 3) How large is the Derby enumerated plan space, and is it > appropriately-sized? > While other Derby issues have been filed against these questions > in the past, the intent of this specific project is to use the concrete > tools provided by the VLDB paper to make this effort rigorous and > successful at making concrete improvements to the Derby query > optimizer. > If you are interested in pursuing this project, please take these > considerations into mind: > 1) This is NOT an introductory project. You must be quite familiar > with DBMS systems, and with SQL, and in particular with > cost-based query optimization. If terms such as "cardinality > estimation", "correlated query predicates", or "bushy trees" > aren't comfortable terms for you ,this probably isn't the > project you're interested in. > 2) If you are new to Derby, that is fine, but please take advantage > of the extensive body of introductory material on Derby to > become familiar with it: read the Derby Getting Started manual, > download the software and follow the tutorials, read the documentation, > download the source code and learn how to build and run the > test suites, etc. > 3) All I have presented here is an **outline** of the project. You will > need to read the paper(s), study the benchmark queries, and > propose a detailed plan for how to use this benchmark as a tool > for improving the Derby query optimizer. > If these sorts of tasks sound like exciting things to do, then please > let us know! -- This message was sent by Atlassian JIRA (v6.3.15#6346)