OK, so the last change was successful: The +0 as suggested by Set works. The query now uses the faster query plan through the entire batch (at least it still does after 1769 chunks.
Many thanks for all your input! I still think it would be interesting with some theories about why the engine behaves like this, perhaps from the dev team...? Mvh, Kjell Den 2020-05-01 kl. 09:53, skrev Kjell Rilbe [email protected] [firebird-support]: > > Well, the changed subselect didn't do the trick, but I think it speeds > up the query slightly, overall. > > Now, it's this query that's causing problems: > > insert into "TmpFKExportUhant" ("ECO_ID", "Bärartyp") > select FtgOmsar."Uppgiftshanterare", 'FöretagOmsättningsår' "Bärartyp" > from "Företag" F > inner join "TmpFKExportId" L on L."ECO_ID" = F."ECO_ID" > inner join "FöretagOmsättningsår" FtgOmsar on FtgOmsar."Företag" = > F."ECO_ID" + 0 > inner join "År" Ar on Ar."ECO_ID" = FtgOmsar."Omsättningsår" + 0 > where not exists ( > select 1 > from "FöretagOmsättningsår" FtgOmsar2 > inner join "År" Ar2 on Ar2."ECO_ID" = FtgOmsar2.."Omsättningsår" > where FtgOmsar2."Företag" = F."ECO_ID" > and Ar2."Årtal" > Ar."Årtal" > ); > > The "breaking point" has now moved to 361 executions. Before that point, > the "fast" execution plan looks like this (both styles): > > PLAN JOIN ( > L NATURAL, > F INDEX ("IX_PK_Företag"), > FTGOMSAR INDEX ("IX_FöretagOmsättningsåDBN"), > AR INDEX ("IX_PK_År") > ) > > Select Expression > -> Nested Loop Join (inner) > -> Filter > -> Table "FöretagOmsättningsår" as "FTGOMSAR2" Access By ID > -> Bitmap > -> Index "IX_FöretagOmsättningsåDBN" Range Scan > (full match) > -> Filter > -> Table "År" as "AR2" Access By ID > -> Bitmap > -> Index "IX_PK_År" Unique Scan > Select Expression > -> Nested Loop Join (inner) > -> Table "TmpFKExportId" as "L" Full Scan > -> Filter > -> Table "Företag" as "F" Access By ID > -> Bitmap > -> Index "IX_PK_Företag" Unique Scan > -> Filter > -> Table "FöretagOmsättningsår" as "FTGOMSAR" Access By ID > -> Bitmap > -> Index "IX_FöretagOmsättningsåDBN" Range Scan > (full match) > -> Filter > -> Table "År" as "AR" Access By ID > -> Bitmap > -> Index "IX_PK_År" Unique Scan > > After the slowdown, the "bad" query plan looks like this: > > PLAN JOIN ( > L NATURAL, > F INDEX (IX_PK_Företag), > FTGOMSAR INDEX (IX_FöretagOmsättningsåDBN), > AR INDEX (IX_PK_År) > ) > > Select Expression > -> Nested Loop Join (inner) > -> Filter > -> Table "FöretagOmsättningsår" as "FTGOMSAR2" Access By ID > -> Bitmap > -> Index "IX_FöretagOmsättningsåDBN" Range Scan > (full match) > -> Filter > -> Table "År" as "AR2" Access By ID > -> Bitmap > -> Index "IX_PK_År" Unique Scan > Select Expression > -> Nested Loop Join (inner) > -> Table "År" as "AR" Full Scan > -> Filter > -> Table "FöretagOmsättningsår" as "FTGOMSAR" Access By ID > -> Bitmap > -> Index "IX_FöretagOmsättningså9OF" Range Scan > (full match) > -> Filter > -> Table "TmpFKExportId" as "L" Access By ID > -> Bitmap > -> Index "PK_TmpFKExportId" Unique Scan > -> Filter > -> Table "Företag" as "F" Access By ID > -> Bitmap > -> Index "IX_PK_Företag" Unique Scan > > It's worth noting that the old-style query plans are identical, but the > new-style ones do have a difference, which is probably causing the > problem. > > I am trying now with the +0 that Set suggested. > > But why does Firebird cosistently change it's execution strategy for a > query that's being executed for the Nth time without change and on > essentially unmodified data? During these batches, the tables "År" and > "FöretagOmsättningår" are never changed in any way.. The table "Företag" > may be changed in a few records, but not in the columns or indices used > in the query. The table "TmpFKExportId" will have a new set of 250 id:s > for each execution of the query, but its index selectivity would be > constant (1/250). So, as far as I can see, there's absolutely no reason > to be found in the actual data. It must be something else: cache, > workload, garbage collector...? > > But during the batch, FBMonitor doesn't show any change in the garbage > collector's behavior at the time of slowdown/change of query strategy. > > The only thing I can see in FBMonitor's graphs is that about 35 minutes > before the slowdown, the I/O activity graph shows an increase in cache > hits, but no apparent change in disk reads or disk writes. Studying the > actual statements around that time, I fail to see any significant change > in the statistics. The statistics do change significantly at the time of > slowdown though, when the bad query plans starts being used. > > I think Firebird's behavior is really really weird here. It consistently > changes to a worse query plan for the same query on essentially the same > data, just because it's already been executed N times, where N = 316-320 > with the old query and apparently N = ~361 with the current query. > > This must be caused by some internal transient state, associated with > the app's activity history. Some buffer filling up or some other > resource being depleted or something like that. Is it abug? If so, how > on earth could it possibly be investigated? Ugh... > > Mvh, > Kjell > > Den 2020-05-01 kl. 01:12, skrev Kjell Rilbe > [email protected] [firebird-support]: > > > > Hi Set! > > > > Understood and I thank you for pointing it out. My last attempt was to > > put all the Uhant."ECO_ID" in a temp table and then use that to run the > > last part of the query. That avoids the unions, but adds some extra work > > to store the records in the temp table. All in all I suppose it's > > slightly more work for the engine. > > > > But as it turns out, it's that part of the query that suddenly takes a > > lot longer to execute. > > > > Trying now with the alternative subselect you suggested earlier, i.e. > > not exists (...). If that doesn't work out, I'll try the +0 you suggest > > below. Thumbs up! > > > > Regards, > > Kjell > > > > Den 2020-04-30 kl. 23:14, skrev Svein Erling Tysvær [email protected] > > [firebird-support]: > > > Hej Kjell, > > > > > > I'm not used to seeing plans this way, but if I'm reading the plan > > > correctly, then adding +0 or || '' (depending on the field type) > > > immediately after your first union like this: > > > > > > select 'FöretagOmsättningsår' "Bärartyp", FtgOmsar."ECO_ID" > > > "Bärare", Uhant."ECO_ID" > > > from "Företag" F > > > inner join "TmpFKExportId" L on L."ECO_ID" = F."ECO_ID" > > > inner join "FöretagOmsättningsår" FtgOmsar on FtgOmsar."Företag" > > > = F."ECO_ID" +0 > > > inner join "År" Ar on Ar."ECO_ID" = FtgOmsar."Omsättningsår" +0 > > > inner join "Uppgiftshanterare" Uhant on Uhant."ECO_ID" = > > > FtgOmsar."Uppgiftshanterare" > > > > > > ought to speed up your query. The point being that your slow plan only > > > is an option if indices can be used for FtgOmsar."Omsättningsår" and > > > F."ECO_ID" and that this kind of addition prevents those indices from > > > being used. > > > > > > HTH, > > > Set > > > > > > tor. 30. apr. 2020 kl. 17:43 skrev Kjell Rilbe > > > [email protected] > > > <mailto:[email protected]> [firebird-support] > > > <[email protected] > > > <mailto:[email protected]>>: > > > > > > Thanks Karol! I will consider explicit plan if my current rewrite > > > of the > > > query doesn't pan out. > > > > > > The table "TmpFKExportId" will always have the same number of > > > records, > > > but a different set (each chunk of the batch will load the same > > > number > > > of id:s, but of course different id values). Statistics should be > > > constant. > > > > > > The other tables will vary slightly over time, but the database > > > contains > > > 2+ million companies and related data. During the batch, the only > > > updates to this data is from our TM staff who phone companies and > > > enter > > > data manually, one company at a time. So overall, the change rate is > > > minute. And in particular, there's no big change exactly 318 > > > chunks into > > > the batch job, every time. > > > > > > Yes, the query is hardcoded as a string literal into the app's source > > > code. It can't get more "same" than that. :-) > > > > > > Mvh, > > > Kjell > > > > > > Den 2020-04-30 kl. 17:06, skrev Karol Bieniaszewski > > > [email protected] <mailto:[email protected]> > > > [firebird-support]: > > > > > > > > I suppose you have two different queries – one with where clause > > > and > > > > one without on one of the tables involved in the query. > > > > Are you sure that the queries are the same? > > > > > > > > But also it is quite usual that after new data changes the plan is > > > > about to change. > > > > Is this true for your case that some table got more records > > > after fill > > > > then previously second table? > > > > Especially this one?: > > > > > > > > You can try also modify your query to have static plan (some +0) or > > > > update index statistics after some count of operations. > > > > > > > > Regards, > > > > Karol Bieniaszewski > > > > > > > > [Non-text portions of this message have been removed] > > > > > > > > > > > > > > > > > [Non-text portions of this message have been removed] > > > > > > > > > > > > ------------------------------------ > > > Posted by: Kjell Rilbe <[email protected] > > > <mailto:[email protected]>> > > > ------------------------------------ > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > Visit http://www.firebirdsql.org and click the Documentation item > > > on the main (top) menu. Try FAQ and other links from the > > > left-side menu there. > > > > > > Also search the knowledgebases at > > > http://www.ibphoenix.com/resources/documents/ > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > ------------------------------------ > > > > > > Yahoo Groups Links > > > > > > > > > [email protected] > > > <mailto:[email protected]> > > > > > > > > > > > > > [Non-text portions of this message have been removed] > > > > > > [Non-text portions of this message have been removed] > > [Non-text portions of this message have been removed]
