I knew if we "poked the bear" long enough, he'd respond and remind us all
that it's still his forest.

Much of what I thought I knew came from a session I was in with a BMC
Consultant some years ago.  Thanks for updating and clarifying so much, Doug
- you are appreciated!

On Mon, Jan 18, 2010 at 1:31 PM, Mueller, Doug <[email protected]> wrote:

> Folks,
>
> This thread has actually turned into several different conversations -- all
> related to threads and queues and how the AR System server functions; but
> several different topics.  To help share a bit of information about how the
> system works and clarify what the purpose/intention around things are and
> to
> offer some comment on the original question, I thought I would try and
> address
> the various topics with some comments.
>
> 1) What is the difference between a queue and a thread?
>
> Before going into the fast/list discussion, I want to make sure that
> everyone
> is clear about a couple of terms:  queue and thread.  The fast and list are
> actually queues in the system which in turn have one or more threads
> defined
> for them.
>
> In the AR System server, there are a set of queues.  There is the Admin,
> Fast,
> List, and Escalation by default (OK, there are also some used for services
> like
> the plugin server and such).  These are essentially connection points that
> you
> can go to.  You can also have private queues that offer additional
> connection
> points to the system.
>
> Each of these queues has one or more threads -- each thread being a
> database
> connection and a processing lane for an API call.
>
> Think of it the following way.  If you were going to a futbol game in Spain
> (that's soccer for those of us in the backwards United States), the stadium
> generally has multiple queues you can enter.  Say in a simple case, they
> may
> be on the N, E, S, and W of the stadium.  Now, there may be a special queue
> in the NE for the "skybox" owners.
>
> Each of the queues, the entrances to the stadium, has multiple lanes where
> people can enter.  These are the threads.  Threads are local to the
> individual
> queues.  You cannot be in the queue on the N of the stadium and go through
> an
> entry lane that is on the S for example.
>
> However, if one of the queues is closed, people get routed to one of the
> queues
> that is open so you are not blocked out of the game just because the queue
> you
> were targetting is not available, you just get routed to another one that
> is
> available  (in the AR System case, the fast/list pair of queues is the one
> that
> you get routed to if your specific queue is not available).
>
>
> So, the system has a set of queues -- some pre-defined, some private and
> defined per site -- and each of them has processing threads as configured.
>
> No, this was not a topic brought up, but it is important to understand this
> clearly for some of the topics that were.
>
>
> 2) Why fast/list?  Are they relevant?
>
> One of the topics that was discussed is the fast vs. list queue and the
> reasoning behind it.
>
> As was noted, any queue in the system can perform any operation.  OK,
> almost...
> The exception is that any operation that restructures definitions or
> changes
> the database MUST go through the Admin queue and will be routed to that
> queue.
> No queue other than the Admin queue will process restructure operations.
>
> Anyway, other than that distinction, any queue in the system can perform
> any
> non-restructuring API operation.
>
> BMC has optimized the system to two different queues by default.
>
>   Fast (just a name without intending to indicate performance)
>   List (just a name as well but was aimed at things that search/scan/find
> and
>         return lists of things)
>
> List calls may be faster than Fast calls.  Fast calls may be faster than
> List
> calls.
>
> The "fast" queue gets all calls that are controlled by the developer and
> that
> have discrete operations and activity.  This includes operations that
> create,
> modify, delete.  It includes retrieving details of a single item given the
> ID of that item.  It includes a lot of miscellaneous calls that have
> definitive
> discrete operations where the end user is making a call/performing an
> operation
> where they don't really have control over what the operation is going to do
> at the end of the day.
>
> The "list" queue gets all the calls that often are (not always but often
> can
> be) affected by the end user or where the speed of operation is not always
> controllable.  It includes the search calls and operations like export and
> running processes on the server from an active link.  These operation are
> often
> fast, but they have the potential to become long.  There is high
> variability to
> the performance or throughput of the calls.  Depending on how well
> qualified or
> what you are trying to retrieve, they are calls that can return little or
> large
> amounts of data.  The user often has an influence into overall throughput
> or
> performance because they often have some level of control over the
> qualifications or the amount of data they can request.  (Yes, that is the
> reason the Admin has been given lots of different ways to control how much
> data
> and the way that the query can be constructed -- to control the performance
> and
> impact of these calls).
>
> It is still as relevant as it has always been to have the queues.  It
> allows
> for the adjustment of the threads that are needed to focus on the two
> different
> classes of operations.  Very often, system administrators will find that
> adjusting the number of threads in one of these queues has a significant
> impact on performance.  If there were not difference in the queues or the
> way
> load of the system was by default split between them, this wouldn't really
> be
> the case.
>
> Also, in general, it is found that a higher number of threads in the list
> queue
> than in the fast queue is an appropriate configuration of the system.  The
> vast
> majority of the time, the variability of the interaction on the search
> calls
> and the overall time spend on searching vs. creating/updating dictates that
> more database connnections and processing threads related to searching will
> give the system better throughput.
>
>
> 3) Dispatcher model for queue processing
>
> Now, to the strategy for how queue processing works.  When a call comes
> into
> the AR System server, it is targetted for some queue on the system.  If
> that
> queue is not available, the call is redirected to a queue that is
> available.
> If the target queue is available, the dispatcher for that queue accepts the
> operation and then places it on a single list for the queue.  All items
> that
> arrive in a queue are processed by that queue in the order received.  Once
> arrived at a queue, that queue is responsible for processing -- even if
> there
> is another queue somewhere else with fewer items to process.
>
> Once things are in that internal list, the operation at the front of the
> list
> is handed to the next available thread within that queue.  If there is a
> thread
> immediately available, the request never sits in the list at all.  If all
> the
> threads are already busy processing requests, the item will sit in the
> internal
> list until one of the threads finishes and then the next item will start
> processing.  There is no inherent limit of 5 items in the list and it is
> important to note that the list of pending requests is at the queue level
> not
> at the thread level.
>
>
> 4) The original topic -- how to deal with high volume blasts of concurrency
>
> Now, we are back to the original topic of blasts of concurrent operations.
>
> First, is it really the only way things work that there is no work and then
> a blast of xxx simultaneous and then no work?  Could the other system have
> a
> more steady stream of work rather than the periodic blasts of work?  Even
> if
> there was some differentiation, it would balance the load -- of both
> systems.
> Now, this all depends on the other system being able to spread things in
> some
> way.  If it can, this would be a big step to any system for dealing with
> volume.  If not, then we have to look at other things on the AR System
> side.
>
> Assuming you cannot spread the load....
>
> First, your note didn't indicate you were loosing operations or they were
> being
> rejected.  That is because of the dispatcher model and all the operations
> are
> being accepted and put on the list for processing, but they are not able to
> all be processed simultaneously.  So, to start with, we are not loosing
> operations, we just have a tuning issue to deal with.
>
> With the queue model of the system, the first thing I would recommend is to
> configure a private queue for the use of this system.  I assume your system
> is
> not only for the use of this one automated processor.  So, the first thing
> I
> would do is to define a private queue just for that automated system to
> isolate
> the load of that system from other users and especially from interactive
> users.
> You don't want to affect other users of the system when the "flood" hits
> from
> this automated system.
>
> Using a private queue will isolate the load -- just like in the stadium
> example where you have a special entrance for the "unrully crowd".
>
> Then, we can look at the number of threads that are appropriate for this
> private queue.  That can be looked at independently of the number of
> threads
> for other queues in the system for other purposes.
>
> In theory, there is no reason you cannot have 10s or even 100s of threads
> in a
> queue.  It is just a number to us.  We will start up that many if needed.
>  What
> you need to be aware of is that each thread will open a database
> connnection
> so you need to have a database allowing that many connnections AND that
> each
> thread will take some amount of memory.  With the 7.5 release on UNIX and
> Linux
> supporting a 64-bit address space, the memory can be grown more if needed
> (still 32-bit on windows but 64-bit is coming).  You have to make the call
> about how much memory you have, and performance of swapping, and overall
> overhead of processes on the system for the number of threads you want to
> configure.  You also have to worry about system configuration of per
> process
> memory and file descriptors (open connections count as a file descriptor)
> and
> other such things that you may encounter if configuring very large numbers
> of
> threads.
>
> BUT, there is no inherent restriction in the AR System about the number of
> threads you could configure if you wanted to.
>
> You already have seen some of the issues that high simultaneous operations
> on
> a single table can do.
>
> There are some settings that can help with that, for example
>
> Next ID blocks -- limits contention on the next ID database column
> No status history -- If you don't need it for your table, you can set this
>   option and eliminate the create/management of the status history table
> and
>   entries  (definitely available in 7.5; maybe in 7.1 since I cannot
> remember
>   the release this option was added for)
>
> If your logic has workflow that does pushes to other tables that perform
> create operations there, you should really look at these settings on those
> forms as well because you are indirectly issuing creates against them too.
>
> There are options to control where your DB indexes are stored vs. the data
> to
> put indexes and data on different disc areas -- which helps throughput on
> heavy scale create/modify operations.
>
> You should investigate operations your workflow is performing to make sure
> that
> they are tuned as well as possible and that there are no inefficient
> searches
> or steps to the process so that the total processing time for each create
> can
> be minimized.   If you don't have to do something during the interactive
> processing of the create, don't.  Don't do it at all if it is not needed or
> perform the processing later if it is not necessary for interactive
> response
> to the user.
>
> Hopefully, this gives some ideas to look at.  It is not a definitive answer
> to
> your inquiry, but hopefully some useful thoughts that lead you toward the
> best answer for your situation.  And, maybe some ideas for others to
> consider.
>
>
>
> Hopefully, this note has been useful in terms of providing some information
> about how the system functions -- and maybe some reasoning behind why
> choices
> were made.  I hope it also provided some ideas for dealing with high
> simultaneous operations situations.
>
> Doug Mueller
>

_______________________________________________________________________________
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
Platinum Sponsor:[email protected] ARSlist: "Where the Answers Are"

Reply via email to