[
https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888536#comment-15888536
]
Harshvardhan Gupta commented on DERBY-6921:
-------------------------------------------
Hi Mr Bryan,
I am interested in this project and would like to contribute. As a starting
point, I am going through the VLDB paper and the Derby project and would follow
up with you soon.
A little background about myselves: I am a 4th year Computer Science
undergraduate student at Birla Institute of Technology & Sciences, Pilani,
India. Relevant experience for this project includes two of my former and
current internships-
1) Last summer I worked with the co-inventor of Apache Hive at Qubole Inc. to
build a SQL autocomplete engine in JS on top of Qubole's managed Hive offering.
2) I am currently in middle of a semester long internship at Amazon where I
analyzed the Query plans of Aurora and Redshift queries in order to tweak them
for performance.
I am interested in database and search technologies in general and would love
to get a chance to contribute to this project through GSoC 2017.
> How good is the Derby Query Optimizer, really
> ---------------------------------------------
>
> Key: DERBY-6921
> URL: https://issues.apache.org/jira/browse/DERBY-6921
> Project: Derby
> Issue Type: Improvement
> Components: SQL
> Reporter: Bryan Pendleton
> Priority: Minor
> Labels: database, gsoc2017, java, optimizer
> Original Estimate: 2,016h
> Remaining Estimate: 2,016h
>
> At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
> Technical University introduced a new benchmark suite for evaluating
> database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> The benchmark test suite is publically available:
> http://db.in.tum.de/people/sites/leis/qo/job.tgz
> The data set for running the benchmark is publically available:
> ftp://ftp.fu-berlin.de/pub/misc/movies/database/
> As part of Google Summer of Code 2017, I am volunteering to mentor
> a Summer of Code intern who is interested in using these tools to
> improve the Derby query optimizer.
> My suggestion for the overall process is this:
> 1) Acquire the benchmark tools, and the data set
> 2) Run the benchmark.
> 2a) Some of the benchmark queries may reveal bugs in Derby.
> For each such bug, we need to isolate the bug and fix it.
> 3) Once we are able to run the entire benchmark, we need to
> analyze the results.
> 3a) Some of the benchmark queries may reveal opportunities
> for Derby to improve the query plans that it chooses for
> various classes of queries (this is explained in detail in the
> VLDB paper and other information available at Dr. Leis's site)
> For each such improvement, we need to isolate the issue,
> report it as a separable improvement, and fix it (if we can)
> While the benchmark is an interesting exercise in and of itself,
> the overall goal of the project is to find-and-fix problems in the
> Derby query optimizer, specifically in the 3 areas which are
> the focus of the benchmark tool:
> 1) How good is the Derby cardinality estimator and when does
> it lead to slow queries?
> 2) How good it the Derby cost model, and how well is it guiding
> the overall query optimization process?
> 3) How large is the Derby enumerated plan space, and is it
> appropriately-sized?
> While other Derby issues have been filed against these questions
> in the past, the intent of this specific project is to use the concrete
> tools provided by the VLDB paper to make this effort rigorous and
> successful at making concrete improvements to the Derby query
> optimizer.
> If you are interested in pursuing this project, please take these
> considerations into mind:
> 1) This is NOT an introductory project. You must be quite familiar
> with DBMS systems, and with SQL, and in particular with
> cost-based query optimization. If terms such as "cardinality
> estimation", "correlated query predicates", or "bushy trees"
> aren't comfortable terms for you ,this probably isn't the
> project you're interested in.
> 2) If you are new to Derby, that is fine, but please take advantage
> of the extensive body of introductory material on Derby to
> become familiar with it: read the Derby Getting Started manual,
> download the software and follow the tutorials, read the documentation,
> download the source code and learn how to build and run the
> test suites, etc.
> 3) All I have presented here is an **outline** of the project. You will
> need to read the paper(s), study the benchmark queries, and
> propose a detailed plan for how to use this benchmark as a tool
> for improving the Derby query optimizer.
> If these sorts of tasks sound like exciting things to do, then please
> let us know!
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)