> Army wrote: > > > In order to get Derby to recognize the equijoin predicate, the subquery > > has to appear in the FROM list. So I created a view for the subquery: > > <snip view> > > > and then I changed the query to select and join with that view: > > <snip> > > > Working off the modified query above I used "optimizer overrides", which > > were added as part of Derby 10.2, to 1) fix the join order so that the > > subquery becomes the "inner" table, and 2) to force ("guarantee") a hash > > join using the equijoin predicate, thereby allowing materialization of > > the inner table (i.e. of the subquery). The query to do that is as > > follows: > > > > select count(*) from --DERBY-PROPERTIES joinOrder=FIXED > > time, v_hash --DERBY-PROPERTIES joinStrategy=HASH > > where time.id = v_hash.id; > > Just one thing to mention as an afterthought: you do not have to create a > separate view for this to work. I did that for the sake of clarity, but > you > could just as easily write: > > select count(*) from --DERBY-PROPERTIES joinOrder=FIXED > time, > (select distinct > time.id from time, double_sample, band > where band.id = double_sample.fk_band_id > and double_sample.fk_time_id = time.id and > band.id in (11, 12) > ) v_hash --DERBY-PROPERTIES joinStrategy=HASH > where time.id = v_hash.id; > > Of course after writing all of that I realized that you said your queries > are > generated (as opposed to hand-written) and that they "are only > representative of > my problem. [...] While the above sample query can be easily modified to > run > well, I'm not sure how to do it in the general case." > > So maybe this isn't going to work for you, after all...
Hi Army, Thanks for the equijoin/subquery example, it makes sense to me now. Regarding the optimizer override, I should have no problem getting those hints into my sql statement as I'm generating the sql statements myself, and they all follow a similar structure. So on the contrary, your above idea seems very feasible. I have three possible workaround ideas, but I'm going to pursue this one first, as it seems likely to work reasonably well while not taking too much development time, which is a critical issue now. (The other ideas are to simulate a subquery by iterating over two sorted queries in parallel; and to use a temporary table for inner query results, using an equijoin with the outer query). You previously mentioned a Derby property called derby.language.maxMemoryPerTable. I don't see it mentioned in the current documentation, but I found a jira issue which contains some proposed documentation (http://issues.apache.org/jira/browse/DERBY-1397). But it's still not entirely clear to me. Is this property only used when deciding whether to perform a hash join or nested loop join? Or does it also influence how much of the hash join is kept in memory, or when that hash join is spilled to disk? Essentially, I'd like to know, if using the above query with optimizer overrides which force a hash join, what is the threshold for spilling a hash to disk, and is this a tunable parameter? Thanks, Jim