Re: James performance

David Doucette Tue, 12 Jun 2001 23:54:25 -0700
>From [EMAIL PROTECTED]
From: David Doucette <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
X-Sender: [EMAIL PROTECTED]
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

> > My first impression is that this performance bottleneck could be avoided
> > by having one thread (or even more than one thread to improve
> > performance against the database) reading messages from the database and
> > putting them in a queue in memory.  
> 
> It wouldn't be acceptable to have the queue on memory, I think; 2 MB
> attachment on each email, and you have your clients send 100 messages.
> You'd run out your swapspace pretty fast I believe.
You are right -- *if* you didn't place a limit on the queue size.  :)
The queue only needs to hold as many messages as there are delivery
threads.  This would use no more memory than having each delivery thread
load a message from the database into memory.

However, you bring up a good point.  Even with the way it currently
works, if you configure it to have many delivery threads, you better
have enough memory to handle all threads simultaneously loading a large 
message.  How do the delivery threads handle low memory situations?  Do
they wait and retry?

> >The delivery threads could access
> > this same shared queue and pull the data from there instead of reading
> > it from the database.  The advantage of this would be not requiring a
> > connection to the database for every delivery thread, so you could have
> > more delivery threads running.
> 
> I think if James provides a way of assigning priorities to the running
> threads, that would be great. The max. database connection number is not
> quite the issue, the apparent (ie: based on gut feeling benchmark) James
> performance is. So I think, if the threads that handle SMTP connections
> from _mail_clients_ (ie: from inside network) and the POP handlers could
> be given higer priorities than the other threads, James would
> (apparently) run faster; ie: when you need the services (SMTP or POP),
> they are there.
That's an entirely different issue, but could definitely be true,
depending on what the other threads are doing.

However, what I was getting at is that if you could have more delivery
threads running at the same time (which you can't do because of a direct
relationship between the number of delivery threads and number of
database connections), you would also scale the performance until you
raised the number of delivery threads to some high value.  This is
because you can't control slow responses from the machines James is
sending the mail to.

Another advantage to a queue is that it can be populated with data while
the delivery threads are waiting for responses.  This saves time,
because the data is then available when the next delivery thread needs
it.  Right now the delivery thread has to wait for the data to be
retrieved from the database before it can deliver it.

> > In general, however, connection pooling is a good thing!  I've been
> > immersed in coding such things for a week straight now, so it is all
> > very fresh in my mind. :)
> 
> Connection pooling wouldn't increase the max. database connection that
> can be handled by a particular server, right? Sure, it would make the
> server run faster by not always opening and closing the database
> connection for each connection request.
No, it wouldn't affect the max database connection limit, but it would
allow a way around it.  And yes, you are right, it would also provide a
performance benefit if connections were left open.

There are two issues:

1)  Connection Pooling

2)  Data queuing

You don't necessarily need 1 in order to have 2, but you might as well
for the reason you mentioned.  It is the data queuing that allows you to
have more delivery threads than you can have connections to the
database. 

For example, let's say you have 5 connections to the database, each
running in their own thread.  Each one of the threads has access to this
shared queue.  The threads add to the queue until it reaches a size of
300 (at which point the queue doesn't let them add and goes into a
wait() until a record is removed from the queue).  You also have 300
delivery threads.  If they come up after the queue is full (for
simplicity in the example), they will each go grab a message from the
queue.  Each time they grab a message, that will allow a database
connection thread to deposit its data in the queue and go back to the
database for more.  It will be going back to the database for more at
the same time delivery threads are waiting for responses from the remote
host.  The DB connection thread will put the data on the queue again (as
long as the queue doesn't already have 300 elements) and go back for
more data to the database.  Once the delivery thread is done, it will 
go back to the queue and have data ready for it to deliver again.

It is late, so I didn't do the best job of explaining it, but I think if
you read it through a few times you'll get the idea.  It is all about
concurrent processing -- both in retrieval/delivery and in
delivery/delivery/delivery/delivery/delivery...etc.

> > Did you remove messages that had already been sent?  
> 
> No. Definitely not. (I believe that such an effort would be quite
> improbable. :-)
I wasn't sure whether James removed them after it was done or just
flagged them.  Remember, I'm brand new to James. :)

> >If so, I'm curious
> > why James is looking for messages that have already been sent.
> 
> The messages were not yet sent; they were in the error state, or already
> in the users' mailboxes.
Oh, so James might have been retrying them periodically, but since you
removed them, it had problems?

> > I'm initially interested in James for SMTP more than POP3, but since I'm
> > planning on using it for POP3 down the road, this kind of troubles me.
> 
> If you need _only_ SMTP and POP, then use sendmail and pop3d; they are
> more robust (already in version 8.x) and tons faster (implemented in C).
> Once I tried to send an email with a 26MB attachment (jdk1.3.jar),
> sendmail could do it without choking, and retrieving was painless.
> 
> The interesting thing on James is the mailet feature; you could _do_
> something with the messages before they reach the spool or repositories;
> you could do more than just applying address filter.
Yeah, I really want the mailet features.  I may end up hacking some
things out of James when I really get into it in order to increase
performance, but I believe that's one of the points of the project -- to
give people a starting point for their own projects.

It is a lot faster for me to start with James as a framework and modify
it than to code the delivery piece, the mailet piece, the database
piece, etc.

If someone was able to get about 5 msgs/second out of one delivery
thread (must have been fast response from the remote host!), then
increasing the delivery threads to 100 would give you very good
performance (if it scales), even if you have to use disk instead of the
database.  From what I've read before, however, the disk method leads to
more problems on power outages or other times when James goes down
hard.

> > Is it merely the size of some of the messages that is the problem, or
> > does the number of messages in a mailbox/in the database matter??
> 
> Both.
> Once I tried to have a 7MB attachment (got split into 300KB-each
> messages by Outlook Express); retrieving was like forever. Then I had
> 1400 messages sitting on my mailbox; the POP handler timed out.
Doesn't sound like the POP piece is optimized yet. :)

> > To the rest of the group, is this a known problem??
> 
> You didn't believe me, did you?
I just wanted to see if maybe it has been fixed in a newer release than
what you are using or if other people have found a way around it.  I
believe you are having the problem. :)

> > Even if you can control whether you put attachments on your outgoing
> > emails, you can't control whether someone sends you a large one.  That's
> > what makes the POP3 problem worse than not being able to send large
> > attachments (when using the database, not the file system).
> 
> Since the JDBC driver rejects large attachments, it means that my mail
> server couldn't receive large email. It seems that for James, size does
> matter.
HA!  

On a serious note, I'm sure with some tweaking it could manage it.  I'm
guessing most of the work has been done in the line of basic
functionality rather than handling large volumes/sizes as fast as
possible.  It would be too easy for us if it did everything perfectly.
;)

David
> 
> Oki

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: James performance

Reply via email to