Re: Query plan extraction

Army Thu, 09 Nov 2006 17:11:34 -0800

Felix Beyer wrote:

Hi Derby community,

Hello Felix! Good to see you back on the lists. While you were "gone" therewas some discussion about the current state of Derby's query plan logging and Ithink it was generally agreed that it could use some improvement:


http://thread.gmane.org/gmane.comp.apache.db.derby.devel/30818/focus=30818

So it's great to hear back from you again! Hopefully you will find the answers,resources, support, and encouragement you need to achieve your goals from thosehere on the derby-dev list...

a while ago I´ve posted some mails, which shortly introduced myextensions, which I`m developing during my thesis and which I want tocontribute to the community. These extensions will extend Derby withkind of a persistent workload repository and a well-designed query planextraction extension.

I am not sure what is meant by "persistent workload repository", but the idea of"well-designed query plan extraction" certainly sounds promising to me :)

1. More specifically I mean, should this explain functionality includethe plans generated during compilation phase?

I guess my first reaction is "start with whatever is easiest". In the world ofopen source development you do not have to have a "perfect" solution that does"everything" before you contribute it. It's usually better to start small andadd functionality piece-by-piece. This allows the community to see and "play"with the code early on, which means you will get feedback much earlier. It alsomeans that members of the community who are interested in what you are doing canpick up pieces and do additional development on their own, which may save you time.


Okay, so back to your question.

When I think of "query plan extraction" I think about functionality similar tothe current "logQueryPlan" behavior in Derby, except (hopefully) better. Sowould you consider the current output that we get when we set


  derby.language.logQueryPlan=true

to be "compilation plans"? Or would you say that such output is "executionplans"? If logQueryPlan output is considered "compilation plans" then Yes, Ithink it would be great to have this functionality. Otherwise I think this isfunctionality that could be useful for future debugging, but it is probably notas immediately helpful as a better version of the logQueryPlan would be...

Of course, this is just my own opinion; you and anyone else reading this shouldcertainly feel free to difer. If you want to work on plans generated duringcompilation, then please do! You are not required to work on any specific thingjust because I or anyone else say(s) it's "better". Find what interests you andtake it as far as you'd like to. I just hope that whatever you do eventuallygets contributed in some form or another :)

If yes, should the user have the ability to specifiy the exactposition when during compilation (ParseTree, BindTree, OptimizedTree)?

This sounds like a cool idea. My guess is that this functionality would begreat as "follow-up" work after you have an initial, "base" extension on whichto build. I certainly do not think this would have to be part of the first round...

2. Should the scheme support compilation and execution plans (This wouldmean there must be a mapping from the node tree to the resultset treeand vice versa and the scheme has to be more generic)?

See my comments for question #1. Based on this question I am guessing that thecurrent output from "logQueryPlan" falls into the category of "execution plans",so if I had to pick one I would vote for that.

2b. Should the approach be oriented on DB2, where the user has theability to switch between, only explain, execute and explain, justexecute without explain?

I am assuming that when you say "explain" you are talking about "explaining" aquery plan, is that correct? So I could, for example, "explain" the query planfor "select * from t1"? Or is that not what you mean?

In any event, I think we would want to have some way to "disable" the explainfunctionality so that people who currently use Derby do not see a performanceslow-down caused by the extra "explain" work. Of course, I do not know if therewill even *be* a slow-down--I'm just guessing that there will be...?

3. Another approach would be to develop two decoupled solutions, one forthe compilation plans and one for the execution plans. The first onestores NodeTrees and the second one ResultSetTrees? What do you think,does this make sense for you?

This kind of separation certainly seems like it would allow for earliercontribution and thus earlier feedback. That is, you could first work on eitherexecution plans or compilation plans until you have something working to yoursatisfaction. Then you could contribute that piece so that those of us who areinterested can "play" with it--and while we are "playing", you or someone elsein the community can start working on the other kind of plan. Obviously itwould be great if the two types of plans shared a common set of functionality,but again, you do not have to make everything ideal before contributing...

4. Should the extension follow general derby architecture(FactoryInterface and Implementation) and should it therefore be sogeneric, that for example the extraction of the plans into xml fileswill also be possible with the suggested approach?

The notion of extracting query plans into XML is one that sounds particularlyinteresting to me. As of Derby 10.2 we have a builtin XML datatype that allowssimple querying of XML values. So if we could extract Derby's query plans intoan XML format, we would (theoretically) be able to query the plans for thespecific pieces in which we are interested. That sounds like an excellentfeature to me.

That said, I will again repeat myself and say "start with whatever is easiest".As cool as it would be to have an XML formatted query plan, maybe that isgoing to require more effort. In that case you could start with something morebasic and then add an XML "piece" later on.


And speaking of XML, in an email several months back you wrote:

<begin quote>

By the way, in a former project I managed to extend Derby to extract thegenerated optimized query plans in form of XML files for visualizing them in anexternal application. I used the GXL file format for export and visualized theplans with the JGraph Framework. Internal changes affected the current Derbystructure in two ways: First of all a new system function was added to togglequery extraction on or off and second a visitor pattern was used to collect therequired information through a traverse of the query tree after the optimizationstep.


<end quote>

Is that work related to what you are proposing to work on in the next couple ofmonths? If not, do you have any plans/interest in contributing what you did forthat project?

5. Should the solution extend, replace or coexist together with theResultSetStatistics facility?

In the interest of "backward compatibility" I think the ideal situation would beone in which the "default" behavior is to do what we do currently--i.e. thelogQueryPlan behavior should remain as it is. So I do not think we would wantto replace the existing functionality. Co-existence and/or some kind ofoptionally-enabled extension to the current logQueryPlan functionality isprobably preferable.

6. Is there some information, which is available, or easily derivablefrom current information, which is interesting for some of you and iscurrently not printed out with the current implementation of theResultSetStatistics?

One thing that came up in recent months was the fact that there are certainqueries for which the Derby optimizer cost estimates are WAY too high. SeeDERBY-1905, for example. So one piece of information that would be nice to havein a query plan is an indication of just how bad the optimizer's row and costestimates are for a given query. This would be very valuable to thosedevelopers who are interested in improving the optimizer's cost estimates (suchas me, for example).

But again, start with what is easy and build on it...(have I said that enoughyet? ;)

Have you got further ideas regarding this or some similiar extension?

I think what you have talked about sounds excellent. I would be very happy ifwe had the "query extraction" extension that you describe. Further ideas andextensions could serve as follow-up development for you and anyone else who isinclined to participate...

My current timetable is the following:
In some weeks (before end of november) I want to post a detailedconcept, describing my extensions in full detail. After the feedback forthis, I want to start with the development. At the end of the year, Iwant to have a working solution. In the new year, I want to run thederby test suites and a couple of performance impact measurements totest my solution and to improve the stability. After that I`ll post anew thread, providing my results.


Wow, that is an impressive timeline :)

My one comment here is that you should consider developing and contributing yourwork in incremental fashion. You do not have to have a complete "workingsolution" in order to post to derby-dev or to ask for feedback. Feel free topost partial or unrefined code and to ask for feedback at any stage during thedevelopment. I for one would rather have incomplete or "rough draft" code thatI can play with early on. And as I mentioned above, the earlier you contribute,the more feedback you will get from the community...

Thanks again for volunteering to be a part of the Apache Derby community! Welook forward to hearing more from you as start and complete your project(s). Asa developer who spends a lot of my time working in the optimizer, I am certainlyexcited to see what comes of your work...


Army

Re: Query plan extraction

Reply via email to