Felix Beyer wrote:
Hi Derby community,

Hello Felix! Good to see you back on the lists. While you were "gone" there was some discussion about the current state of Derby's query plan logging and I think it was generally agreed that it could use some improvement:

http://thread.gmane.org/gmane.comp.apache.db.derby.devel/30818/focus=30818

So it's great to hear back from you again! Hopefully you will find the answers, resources, support, and encouragement you need to achieve your goals from those here on the derby-dev list...

a while ago I´ve posted some mails, which shortly introduced my extensions, which I`m developing during my thesis and which I want to contribute to the community. These extensions will extend Derby with kind of a persistent workload repository and a well-designed query plan extraction extension.

I am not sure what is meant by "persistent workload repository", but the idea of "well-designed query plan extraction" certainly sounds promising to me :)

1. More specifically I mean, should this explain functionality include the plans generated during compilation phase?

I guess my first reaction is "start with whatever is easiest". In the world of open source development you do not have to have a "perfect" solution that does "everything" before you contribute it. It's usually better to start small and add functionality piece-by-piece. This allows the community to see and "play" with the code early on, which means you will get feedback much earlier. It also means that members of the community who are interested in what you are doing can pick up pieces and do additional development on their own, which may save you time.

Okay, so back to your question.

When I think of "query plan extraction" I think about functionality similar to the current "logQueryPlan" behavior in Derby, except (hopefully) better. So would you consider the current output that we get when we set

  derby.language.logQueryPlan=true

to be "compilation plans"? Or would you say that such output is "execution plans"? If logQueryPlan output is considered "compilation plans" then Yes, I think it would be great to have this functionality. Otherwise I think this is functionality that could be useful for future debugging, but it is probably not as immediately helpful as a better version of the logQueryPlan would be...

Of course, this is just my own opinion; you and anyone else reading this should certainly feel free to difer. If you want to work on plans generated during compilation, then please do! You are not required to work on any specific thing just because I or anyone else say(s) it's "better". Find what interests you and take it as far as you'd like to. I just hope that whatever you do eventually gets contributed in some form or another :)

If yes, should the user have the ability to specifiy the exact position when during compilation (ParseTree, BindTree, OptimizedTree)?

This sounds like a cool idea. My guess is that this functionality would be great as "follow-up" work after you have an initial, "base" extension on which to build. I certainly do not think this would have to be part of the first round...

2. Should the scheme support compilation and execution plans (This would mean there must be a mapping from the node tree to the resultset tree and vice versa and the scheme has to be more generic)?

See my comments for question #1. Based on this question I am guessing that the current output from "logQueryPlan" falls into the category of "execution plans", so if I had to pick one I would vote for that.

2b. Should the approach be oriented on DB2, where the user has the ability to switch between, only explain, execute and explain, just execute without explain?

I am assuming that when you say "explain" you are talking about "explaining" a query plan, is that correct? So I could, for example, "explain" the query plan for "select * from t1"? Or is that not what you mean?

In any event, I think we would want to have some way to "disable" the explain functionality so that people who currently use Derby do not see a performance slow-down caused by the extra "explain" work. Of course, I do not know if there will even *be* a slow-down--I'm just guessing that there will be...?

3. Another approach would be to develop two decoupled solutions, one for the compilation plans and one for the execution plans. The first one stores NodeTrees and the second one ResultSetTrees? What do you think, does this make sense for you?

This kind of separation certainly seems like it would allow for earlier contribution and thus earlier feedback. That is, you could first work on either execution plans or compilation plans until you have something working to your satisfaction. Then you could contribute that piece so that those of us who are interested can "play" with it--and while we are "playing", you or someone else in the community can start working on the other kind of plan. Obviously it would be great if the two types of plans shared a common set of functionality, but again, you do not have to make everything ideal before contributing...

4. Should the extension follow general derby architecture (FactoryInterface and Implementation) and should it therefore be so generic, that for example the extraction of the plans into xml files will also be possible with the suggested approach?

The notion of extracting query plans into XML is one that sounds particularly interesting to me. As of Derby 10.2 we have a builtin XML datatype that allows simple querying of XML values. So if we could extract Derby's query plans into an XML format, we would (theoretically) be able to query the plans for the specific pieces in which we are interested. That sounds like an excellent feature to me.

That said, I will again repeat myself and say "start with whatever is easiest". As cool as it would be to have an XML formatted query plan, maybe that is going to require more effort. In that case you could start with something more basic and then add an XML "piece" later on.

And speaking of XML, in an email several months back you wrote:

<begin quote>

By the way, in a former project I managed to extend Derby to extract the generated optimized query plans in form of XML files for visualizing them in an external application. I used the GXL file format for export and visualized the plans with the JGraph Framework. Internal changes affected the current Derby structure in two ways: First of all a new system function was added to toggle query extraction on or off and second a visitor pattern was used to collect the required information through a traverse of the query tree after the optimization step.

<end quote>

Is that work related to what you are proposing to work on in the next couple of months? If not, do you have any plans/interest in contributing what you did for that project?

5. Should the solution extend, replace or coexist together with the ResultSetStatistics facility?

In the interest of "backward compatibility" I think the ideal situation would be one in which the "default" behavior is to do what we do currently--i.e. the logQueryPlan behavior should remain as it is. So I do not think we would want to replace the existing functionality. Co-existence and/or some kind of optionally-enabled extension to the current logQueryPlan functionality is probably preferable.

6. Is there some information, which is available, or easily derivable from current information, which is interesting for some of you and is currently not printed out with the current implementation of the ResultSetStatistics?

One thing that came up in recent months was the fact that there are certain queries for which the Derby optimizer cost estimates are WAY too high. See DERBY-1905, for example. So one piece of information that would be nice to have in a query plan is an indication of just how bad the optimizer's row and cost estimates are for a given query. This would be very valuable to those developers who are interested in improving the optimizer's cost estimates (such as me, for example).

But again, start with what is easy and build on it...(have I said that enough yet? ;)

Have you got further ideas regarding this or some similiar extension?

I think what you have talked about sounds excellent. I would be very happy if we had the "query extraction" extension that you describe. Further ideas and extensions could serve as follow-up development for you and anyone else who is inclined to participate...

My current timetable is the following:
In some weeks (before end of november) I want to post a detailed concept, describing my extensions in full detail. After the feedback for this, I want to start with the development. At the end of the year, I want to have a working solution. In the new year, I want to run the derby test suites and a couple of performance impact measurements to test my solution and to improve the stability. After that I`ll post a new thread, providing my results.

Wow, that is an impressive timeline :)

My one comment here is that you should consider developing and contributing your work in incremental fashion. You do not have to have a complete "working solution" in order to post to derby-dev or to ask for feedback. Feel free to post partial or unrefined code and to ask for feedback at any stage during the development. I for one would rather have incomplete or "rough draft" code that I can play with early on. And as I mentioned above, the earlier you contribute, the more feedback you will get from the community...

Thanks again for volunteering to be a part of the Apache Derby community! We look forward to hearing more from you as start and complete your project(s). As a developer who spends a lot of my time working in the optimizer, I am certainly excited to see what comes of your work...

Army

Reply via email to