Re: RDBMS support

David Doucette Sun, 17 Jun 2001 17:51:34 -0700

>Let me try to explain locking once more (since there's confusion on that and
>the db pool issue).
Thank you!  That explains it.  I also like how that works and is how I've 
implemented data pools/queues before in Java.

<snip>

>Everything with this approach is very scalable (locks work great, multiple
>threads can work on multiple messages simultaneously, use multiple db
>connections, etc...) except for how TownSpoolRepository implements accept().
>Right now what it does is list() of all the keys, and then walks through the
>list to try to find an unlocked message.  This is unscalable because as the
>list of messages grows, this gets slower and slower...
*nods*

What happens to messages that are sent.  How does James know not to send 
them again?  I understand the thread must call remove(), but what does this 
actually do?

Also, when do the messages get loaded into memory from the DB?  Is that 
done every time list() is called?

This may not make sense, because I may not understand *all* of this 
completely yet, but couldn't the messages that accept() returns a key for 
get put in another queue?  And then when the thread that has the lock calls 
retrieve(), it can be moved to another queue.  So you would end up with 
three pools/queues/whatever you want to call them.  The first would be all 
messages that have been retrieved from the DB that haven't been accept()ed 
yet.  The second would be accept()ed messages.  The third would be 
retrieve()d messages.  I'm assuming remove() removes the message and that 
the thread calls this when it successfully sends the message.  remove() 
could just act on the third queue.

Now you don't even have to search the first queue for a message, you can 
just pull the first one off the array/queue/whatever and give it to a 
thread calling accept().  You have to search through the second queue to 
find the message associated with a key, but this is a much smaller number 
of messages than the first queue, theoretically (if the piece responsible 
for keeping the spool filled is doing its job and if a lot of messages are 
being processed).  Messages *should* only be in the second queue for a very 
short time too.  The third queue represents all the messages currently 
attempting to be sent.  Remove() will have to search this queue, but again, 
it should be a smaller queue than the first.

I'm not sure how messages are searched by accept() to see if they have a 
lock, but that wouldn't be required if the virgin messages are kept 
separate from the ones that are being worked on.

Does this make sense, or doesn't it fit at all to try to do this with how 
it currently works?

>What's even worse is how it implements accept(long delay).  If a message is
>put back in the queue for a later retry, a thread calls accept(long delay),
>which means the same thing as accept but wait up to "delay" milliseconds
>before retrying a failed message.  Not only does it lock a message, but it
>then retrieves the message to check the time.  This makes it geometrically
>slower as the number of messages needing a retry increases.
I'm not exactly sure how this works still.  Does that mean the delay is 
determined by each thread, not the message object itself?  I would think 
the message object would contain a time stamp and a duration that it has to 
wait after the time stamp before it can be sent again.  The accept() method 
could either check the message before giving it to the caller or *another* 
queue could contain all retries and a different thread could periodically 
scan that queue.  Of course, you could create high priority threads for 
each message that should be retried and have the message itself go into a 
sleep() and then insert itself back into the queue when the sleep() 
expires.  That would eat up a lot of threads though.  Or if you wanted to 
be more efficient and you didn't care about *exact* retry durations, you 
could keep several queues, where the first represents (for example) 60 
second retries, the second represents 120 second retries, the third, 240 
second retries, etc.  Each time a message fails, it is put into the queue 
after the one it was in before.  Then you could have a thread for each 
queue that wakes up every 60, 120, 240, etc. seconds and scans its queue, 
dumping all the messages back into the main spool.  Obviously, you could 
get by with one scanner thread if the durations *were* a multiple of 60.

Make sense?

>Anyway, pure JDBC access will help me build much better implementations of
>this (not that it's impossible without it).  I have to look over the db pool
>code from excalibur, but with that, hopefully I can get it working this
>weekend.
Do the above ideas get around the need for pure JDBC access?

>Serge Knystautas
>Loki Technologies
>http://www.lokitech.com/
>----- Original Message -----
>From: "David Doucette" <[EMAIL PROTECTED]>
>To: <[EMAIL PROTECTED]>
>Sent: Saturday, June 16, 2001 2:14 PM
>Subject: Re: RDBMS support
>
>
> > First off, I'm very new to James and since I haven't been able to get the
> > last stable release out of CVS yet, I'm forced to sit on the sidelines and
> > watch james-dev and james-user.  However, I have learned some things and
> > have thought about how I would have to make changes if I wanted to
>increase
> > performance.
> >
> > >1. James's database structure is very simple.  The core of James has the
> > >potential to use 2 unrelated tables.  One for a user repository and one
>for
> > >a message spool repository.  The structure of these tables are very
>simple.
> > This is what struck me originally when I heard of all the abstraction
>going
> > on.  It doesn't seem like a very complex system is needed, since James
> > doesn't really have complex database requirements!
> >
> > >2. Users are already shielded from the database code by the
>UserRepository
> > >and SpoolRepository API.  This ease of use would really only help the one
>or
> > >two developers who write the DatabaseUserRepository and
> > >DatabaseSpoolRepository class.
> > Very good point.
> >
> > >3. We need a lot of control over how data is returned for performance
> > >reasons.  Two big limitations we are experiencing with Town (that I
>believe
> > >we would have with Turbine and other abstraction layers) is the inability
>to
> > >return parts of a ResultSet and to get streamed access to binary data.  I
> > >believe both of these are critical to increasing scalability and
> > >performance.
> > I'm still not 100% clear on how the locking of records works (it almost
> > sounds like it uses Java's thread locking mechanisms rather than locking
>by
> > setting a flag in the DB).  However, If the engine has to periodically
> > re-read all the messages in the spool just to send some of them out, then
> > I'm all for anything that will turn that around.  It just won't scale
> > without doing something about this.  If someone could explain again how
> > this works, I'd appreciate it.
> >
> > >Then the only thing
> > >we need is a JDBC connection pool code, and I happen to have one I can
>add
> > >pretty easily.
> > Again, I'm just not clear on how database connections work now.  It seems
> > to me that there can only be one connection retrieving spool messages,
> > because otherwise locking in memory wouldn't work and each thread that has
> > a connection would retrieve the same messages and send them again.
> >
> > >Any thoughts?  I can throw the JDBC spool repository implementation
>together
> > >pretty quickly using the old table structure.
> > Go for it!
> >
> > I know that I would really like not having to redesign this part of James
> > so that it scales and I'm sure others feel the same way.
> >
> > David
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: RDBMS support

Reply via email to