I knew if we "poked the bear" long enough, he'd respond and remind us all that it's still his forest.
Much of what I thought I knew came from a session I was in with a BMC Consultant some years ago. Thanks for updating and clarifying so much, Doug - you are appreciated! On Mon, Jan 18, 2010 at 1:31 PM, Mueller, Doug <[email protected]> wrote: > Folks, > > This thread has actually turned into several different conversations -- all > related to threads and queues and how the AR System server functions; but > several different topics. To help share a bit of information about how the > system works and clarify what the purpose/intention around things are and > to > offer some comment on the original question, I thought I would try and > address > the various topics with some comments. > > 1) What is the difference between a queue and a thread? > > Before going into the fast/list discussion, I want to make sure that > everyone > is clear about a couple of terms: queue and thread. The fast and list are > actually queues in the system which in turn have one or more threads > defined > for them. > > In the AR System server, there are a set of queues. There is the Admin, > Fast, > List, and Escalation by default (OK, there are also some used for services > like > the plugin server and such). These are essentially connection points that > you > can go to. You can also have private queues that offer additional > connection > points to the system. > > Each of these queues has one or more threads -- each thread being a > database > connection and a processing lane for an API call. > > Think of it the following way. If you were going to a futbol game in Spain > (that's soccer for those of us in the backwards United States), the stadium > generally has multiple queues you can enter. Say in a simple case, they > may > be on the N, E, S, and W of the stadium. Now, there may be a special queue > in the NE for the "skybox" owners. > > Each of the queues, the entrances to the stadium, has multiple lanes where > people can enter. These are the threads. Threads are local to the > individual > queues. You cannot be in the queue on the N of the stadium and go through > an > entry lane that is on the S for example. > > However, if one of the queues is closed, people get routed to one of the > queues > that is open so you are not blocked out of the game just because the queue > you > were targetting is not available, you just get routed to another one that > is > available (in the AR System case, the fast/list pair of queues is the one > that > you get routed to if your specific queue is not available). > > > So, the system has a set of queues -- some pre-defined, some private and > defined per site -- and each of them has processing threads as configured. > > No, this was not a topic brought up, but it is important to understand this > clearly for some of the topics that were. > > > 2) Why fast/list? Are they relevant? > > One of the topics that was discussed is the fast vs. list queue and the > reasoning behind it. > > As was noted, any queue in the system can perform any operation. OK, > almost... > The exception is that any operation that restructures definitions or > changes > the database MUST go through the Admin queue and will be routed to that > queue. > No queue other than the Admin queue will process restructure operations. > > Anyway, other than that distinction, any queue in the system can perform > any > non-restructuring API operation. > > BMC has optimized the system to two different queues by default. > > Fast (just a name without intending to indicate performance) > List (just a name as well but was aimed at things that search/scan/find > and > return lists of things) > > List calls may be faster than Fast calls. Fast calls may be faster than > List > calls. > > The "fast" queue gets all calls that are controlled by the developer and > that > have discrete operations and activity. This includes operations that > create, > modify, delete. It includes retrieving details of a single item given the > ID of that item. It includes a lot of miscellaneous calls that have > definitive > discrete operations where the end user is making a call/performing an > operation > where they don't really have control over what the operation is going to do > at the end of the day. > > The "list" queue gets all the calls that often are (not always but often > can > be) affected by the end user or where the speed of operation is not always > controllable. It includes the search calls and operations like export and > running processes on the server from an active link. These operation are > often > fast, but they have the potential to become long. There is high > variability to > the performance or throughput of the calls. Depending on how well > qualified or > what you are trying to retrieve, they are calls that can return little or > large > amounts of data. The user often has an influence into overall throughput > or > performance because they often have some level of control over the > qualifications or the amount of data they can request. (Yes, that is the > reason the Admin has been given lots of different ways to control how much > data > and the way that the query can be constructed -- to control the performance > and > impact of these calls). > > It is still as relevant as it has always been to have the queues. It > allows > for the adjustment of the threads that are needed to focus on the two > different > classes of operations. Very often, system administrators will find that > adjusting the number of threads in one of these queues has a significant > impact on performance. If there were not difference in the queues or the > way > load of the system was by default split between them, this wouldn't really > be > the case. > > Also, in general, it is found that a higher number of threads in the list > queue > than in the fast queue is an appropriate configuration of the system. The > vast > majority of the time, the variability of the interaction on the search > calls > and the overall time spend on searching vs. creating/updating dictates that > more database connnections and processing threads related to searching will > give the system better throughput. > > > 3) Dispatcher model for queue processing > > Now, to the strategy for how queue processing works. When a call comes > into > the AR System server, it is targetted for some queue on the system. If > that > queue is not available, the call is redirected to a queue that is > available. > If the target queue is available, the dispatcher for that queue accepts the > operation and then places it on a single list for the queue. All items > that > arrive in a queue are processed by that queue in the order received. Once > arrived at a queue, that queue is responsible for processing -- even if > there > is another queue somewhere else with fewer items to process. > > Once things are in that internal list, the operation at the front of the > list > is handed to the next available thread within that queue. If there is a > thread > immediately available, the request never sits in the list at all. If all > the > threads are already busy processing requests, the item will sit in the > internal > list until one of the threads finishes and then the next item will start > processing. There is no inherent limit of 5 items in the list and it is > important to note that the list of pending requests is at the queue level > not > at the thread level. > > > 4) The original topic -- how to deal with high volume blasts of concurrency > > Now, we are back to the original topic of blasts of concurrent operations. > > First, is it really the only way things work that there is no work and then > a blast of xxx simultaneous and then no work? Could the other system have > a > more steady stream of work rather than the periodic blasts of work? Even > if > there was some differentiation, it would balance the load -- of both > systems. > Now, this all depends on the other system being able to spread things in > some > way. If it can, this would be a big step to any system for dealing with > volume. If not, then we have to look at other things on the AR System > side. > > Assuming you cannot spread the load.... > > First, your note didn't indicate you were loosing operations or they were > being > rejected. That is because of the dispatcher model and all the operations > are > being accepted and put on the list for processing, but they are not able to > all be processed simultaneously. So, to start with, we are not loosing > operations, we just have a tuning issue to deal with. > > With the queue model of the system, the first thing I would recommend is to > configure a private queue for the use of this system. I assume your system > is > not only for the use of this one automated processor. So, the first thing > I > would do is to define a private queue just for that automated system to > isolate > the load of that system from other users and especially from interactive > users. > You don't want to affect other users of the system when the "flood" hits > from > this automated system. > > Using a private queue will isolate the load -- just like in the stadium > example where you have a special entrance for the "unrully crowd". > > Then, we can look at the number of threads that are appropriate for this > private queue. That can be looked at independently of the number of > threads > for other queues in the system for other purposes. > > In theory, there is no reason you cannot have 10s or even 100s of threads > in a > queue. It is just a number to us. We will start up that many if needed. > What > you need to be aware of is that each thread will open a database > connnection > so you need to have a database allowing that many connnections AND that > each > thread will take some amount of memory. With the 7.5 release on UNIX and > Linux > supporting a 64-bit address space, the memory can be grown more if needed > (still 32-bit on windows but 64-bit is coming). You have to make the call > about how much memory you have, and performance of swapping, and overall > overhead of processes on the system for the number of threads you want to > configure. You also have to worry about system configuration of per > process > memory and file descriptors (open connections count as a file descriptor) > and > other such things that you may encounter if configuring very large numbers > of > threads. > > BUT, there is no inherent restriction in the AR System about the number of > threads you could configure if you wanted to. > > You already have seen some of the issues that high simultaneous operations > on > a single table can do. > > There are some settings that can help with that, for example > > Next ID blocks -- limits contention on the next ID database column > No status history -- If you don't need it for your table, you can set this > option and eliminate the create/management of the status history table > and > entries (definitely available in 7.5; maybe in 7.1 since I cannot > remember > the release this option was added for) > > If your logic has workflow that does pushes to other tables that perform > create operations there, you should really look at these settings on those > forms as well because you are indirectly issuing creates against them too. > > There are options to control where your DB indexes are stored vs. the data > to > put indexes and data on different disc areas -- which helps throughput on > heavy scale create/modify operations. > > You should investigate operations your workflow is performing to make sure > that > they are tuned as well as possible and that there are no inefficient > searches > or steps to the process so that the total processing time for each create > can > be minimized. If you don't have to do something during the interactive > processing of the create, don't. Don't do it at all if it is not needed or > perform the processing later if it is not necessary for interactive > response > to the user. > > Hopefully, this gives some ideas to look at. It is not a definitive answer > to > your inquiry, but hopefully some useful thoughts that lead you toward the > best answer for your situation. And, maybe some ideas for others to > consider. > > > > Hopefully, this note has been useful in terms of providing some information > about how the system functions -- and maybe some reasoning behind why > choices > were made. I hope it also provided some ideas for dealing with high > simultaneous operations situations. > > Doug Mueller > _______________________________________________________________________________ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org Platinum Sponsor:[email protected] ARSlist: "Where the Answers Are"

