Re: Could use some thoughts on a problem with a query in production
Thanks Bryan. I am looking at all things :) The fact that it is not reproducible with a copy of the database is a hinderance but also can indicate something external. The interesting thing is that the same query with the same parameters was run 40 times over about an 8 hour period with about 30 minutes apart. Each of these hit the exactly the same problem. At the same time, the same query requesting different time periods worked perfectly. Insertion rates into the database also were not affected. Does anyone know anything about the “cache cleaning”? I have started to peak and poke at what it does but I am wondering if there is something in there that might be triggering the issue. The reason that it is peaking my interest is that within the code where the BLOCKED threads are occurring are checked for the “cleaning”. These are the areas that the code is waiting. it is almost such that something gets lock stepped in which the code cannot progress because of something holding a monitor and the monitor is is being released and acquired almost once per result. Also the fact that the CPU utilization of derby went from about 5% to about 45% during this time indicates something is being worked on or a busy wait being performed. On Jun 6, 2017, at 9:28 AM, Bryan Pendleton> wrote: Hi Brett? What's the underlying storage system like? In particular, is it an SSD storage system? Is there any chance that you happened to hit a spot in time where there was some non-Derby event which was causing the storage system to misbehave? I understand that certain SSD implementations, and certain file systems, have to perform automatic maintenance activity on the SSD media, such as "wear leveling", which can impose significant overhead on the file system and cause normal use of the file system to be VERY slow for a significant period. Here's a relatively old article on the subject; you might try some web searching to see if you can get more recent information. http://www.computerworld.com/article/2524589/data-center/analysis--ssd-performanceis-a-slowdown-inevitable-.html Anyway, I guess what I'm suggesting is that you consider investigating whether a non-Derby external factor is at work on the machine which is suffering the performance problems. thanks, bryan Canoga Perkins 20600 Prairie Street Chatsworth, CA 91311 (818) 718-6300 This e-mail and any attached document(s) is confidential and is intended only for the review of the party to whom it is addressed. If you have received this transmission in error, please notify the sender immediately and discard the original message and any attachment(s).
Re: Could use some thoughts on a problem with a query in production
Hi Brett? What's the underlying storage system like? In particular, is it an SSD storage system? Is there any chance that you happened to hit a spot in time where there was some non-Derby event which was causing the storage system to misbehave? I understand that certain SSD implementations, and certain file systems, have to perform automatic maintenance activity on the SSD media, such as "wear leveling", which can impose significant overhead on the file system and cause normal use of the file system to be VERY slow for a significant period. Here's a relatively old article on the subject; you might try some web searching to see if you can get more recent information. http://www.computerworld.com/article/2524589/data-center/analysis--ssd-performanceis-a-slowdown-inevitable-.html Anyway, I guess what I'm suggesting is that you consider investigating whether a non-Derby external factor is at work on the machine which is suffering the performance problems. thanks, bryan
RE: Could use some thoughts on a problem with a query in production
Thanks Katherine, I will look into that. It is interesting that this one query had the issue repeatedly over about a 12 hour period whereas others querying performing the same query with different time periods during the same time did not experience any issue. One would think if the statistics were bad or being updated, they also would have a similar issue. The stack traces seem to show that the indexes are being used; I don’t see any table scans being performed which is usually a good indicator that the indexes are not being use. From: Katherine Marsden [mailto:kmarsdende...@sbcglobal.net] Sent: Monday, June 05, 2017 3:39 PM To: derby-dev@db.apache.org Subject: Re: Could use some thoughts on a problem with a query in production Hi Brett, I just happened to see your post on my phone and what came to mind was I wonder how the automatic statistics update is implemented and if that might somehow be at play here. I may be totally off base but remember at one point this was implemented and there was discussion of how and when to do it and concern that at some point, some query was going to get hit with the cost (for the team so to speak) . It might be worthwhile looking at what triggers the statistics update and if it might be at play here. Thanks Katherine Sent from my iPhone On Jun 5, 2017, at 12:04 PM, Bergquist, Brett> wrote: Background: · A large database approximately 750G · derby.storage.pageCacheSize=64000 · Inserts going into the database about 125/second · Other database updates and deletes are being performed at a lower rate. A query is run by the customer that does not complete in a reasonable amount of time. Unfortunately the customer’s program retries this query up to 40 times compounding the problem. The query retrieves a set of records for a period of time. The query is constructed using the indexes of the table such that the rows should be in sorted order and an external sort not required. Basically the query fires off periodically and requests the records for a specific hour of the day. The next time it fires off and retrieves the next hour, etc. The query has run successfully with other parameters repeatedly through out the week with no issues. Only one set of query parameters caused the issue. The same problem occurred exactly one week later again for the same time period of the day. I was able to retrieve the database locally from the customer and I configured my server to time and time zone and ran the query with no issue. So it does not appear to be an optimizer problem with it choosing the wrong path. There is a script that is run at the customer’s site to retrieve some data that I use for debugging these types of problems as access to the customer site is not possible due to security reasons. One of the things that is captures is stack traces every 5 seconds for 1 minute. The stack traces show no deadlocks and also seem to show that the query path is not doing a large external sort or anything like that. Many of the threads are blocked however and they seem to be blocked at a low level.For example many threads look like this: "DRDAConnThread_98" #123 prio=5 os_prio=64 tid=0x0001027b3800 nid=0x94 waiting for monitor entry [0xfffd37afe000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.derby.impl.store.raw.data.FileContainer.pageValid(Unknown Source) - waiting to lock <0xfffd70d5d598> (a org.apache.derby.impl.store.raw.data.AllocationCache) at org.apache.derby.impl.store.raw.data.FileContainer.getUserPage(Unknown Source) at org.apache.derby.impl.store.raw.data.FileContainer.getPage(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseContainerHandle.getPage(Unknown Source) at org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.latchPage(Unknown Source) at org.apache.derby.impl.store.access.conglomerate.GenericConglomerateController.fetch(Unknown Source) at org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown Source) at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown Source) at org.apache.derby.impl.sql.execute.NestedLoopJoinResultSet.getNextRowCore(Unknown Source) at org.apache.derby.impl.sql.execute.NestedLoopJoinResultSet.getNextRowCore(Unknown Source) at org.apache.derby.impl.sql.execute.NestedLoopJoinResultSet.getNextRowCore(Unknown Source) at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown Source) at org.apache.derby.impl.sql.execute.SortResultSet.getRowFromResultSet(Unknown Source) at