Re: [Dbmail] ANNOUNCE: DBMail 2.2.2 released

Jake Anderson Mon, 05 Feb 2007 00:33:19 -0800

*caution* Long rambling post ahead best taken with an ice cold ginger
beer. (and possibly some salt)
>
>   
>> The move to a truly threaded, scalable and HA architecture is a big
>> change. I don't think its going to be a standard upgrade for most
>> people. If people want true HA then the level of funkyness is going to
>> go up pretty drastically. Heck its almost a fork(spoon). Dbmail 2 for
>> nice small instillations, dbmail 3 for big things. Otherwise you might
>> be trying to cover too many bases. Something big and scalable probably
>> wont be easy peasy to install and configure for your average home
>> user.
>>     
>
> What funkyness are you thinking there might be?
>
> I challenge your statement that DBMail is currently for "nice small
> inst[a]llations" because there are people using it for very large
> systems already. I want to design for and target that level of use
> rather than just happen to be able to handle it by accident.
>
>   
I mean dbmail is (on debian/ubuntu at least) really easy to install and
setup.
unless you start packaging up the stack of applications you depend on,
configured as you depend on them you can run into the realm of
difficulty. Asterisk is a decent example of this i think. Many things
have to be "just so" for it to really work well. Asterisknow however (a
distro for asterisk) packages it all up nicley, drop the image onto the
server boot it and away you go. I am having problems getting its Xen
version running but VMware version worked well.
>> Personally I see some "top level daemon" managing the whole thing and
>> talking via IP to the various front ends and databases. It also being
>> responsible for managing redundancy of data amongst a pool of
>> databases and the like.
>>     
>
> Yes, no. Some kind of cluster manager may be an inevitable design
> decision. Built-in redundancy not so much - I'd rather rely on the
> database to do this for us. With respect to pooling databases (no
> relation whatsoever to pooling database connections, btw), I don't
> presume that I know enough to partition what data goes to which database
> server better than the experts in database design.
>
>   
I was refering to a list of things like "User A is on servers X and Y".
If for some reason you wish to remove server Y from the system the
manager can move the data off that server and into another. Relying on
the database for the clustering can lead to issues with scalability and
reliability. If your application is "cluster aware" in itself then you
can do crazy things like have half your servers mysql and the other half
postgress.
That in it self would have HA weenies drooling i'd think ;->


> If someone hits the hard limits of how much data can go into one of the
> database servers we use, then they're definitely doing something where
> they can afford to spend the time and money needed to beef up the
> software to work around the rather huge size limits in MySQL and
> PostgreSQL.
>
>   
You can scale those out a fair bit but mysql cluster (at the moment) is
in ram tables only, which sucks donkey balls. (i run mysql for my db btw
so.... yeah... sucks to be me in that regard).
I feel (from the armchair, or in this case plastic outdoor dining chair)
that its best if the app knows whats happening. You can still use mysql
and postgres and cluster things up the wazoo if you feel like doing
that. But its simpler (i think) from the end user POV to treat the
databases as the "ideal raid hard drives", want more storage? add
another box. Want more performance? add another box. Want more X ? add
another box. Without having to muck about with setting up cluster stuff
in a database.

>> Basically meaning scaling is copy over the Xen image, boot it and tell
>> the controller that its allowed to use that.
>>     
>
> Neat idea! I think it runs exactly contrary to your assertion that it
> would make everything very complex. If the cluster manager directed
> other cluster members configurations, the tough part would be setting up
> cluster membership. Probably involving some public key encryption to
> make sure rogue nodes don't join the cluster. That's on the hard side,
> but would be fun to work on.
>
> Aaron
>   
The hard part is in making the complex stuff simple ;->
If the system can be made to install and setup easily with lots of
managment goodness (IE I don't have to do anything) then super ;->
Thinking for the managment app C might not be the best thing for it,
perhaps python or some such, as each individual message won't need to
hit it or will only need to do so in a trivial way. And the logic is
likley to become scary ;->.

I have been pondering ways of achieving true high availability, where a
server failure causes 0 disruption to service, even if you are half way
through recieving an email. Though that requires some additional
funkyness (all traffic into the cluster must be broadcast/multicast and
a whole bunch of other dren)


In my "ideal" system the setup is something like this.
Email recieved by a front end server (perhaps by broadcast traffic? each
server can pick which "conversations" its a part of and ignore the
others, it will scale well to a point and much farther than any current
system before you need stuff like dns round robins and proxies (though
proxy would be my 2nd choice)) that server checks its list (in memory)
of all users, if we are accepting the email then it can be passed on to
spam checks and the like.
Email then hits our "stuff it into db" section, that looks at where that
users information is stored. That app sticks the email into the
databases that need it. (so your A grade customers have 3 copies, your B
customers 2 copies and your C customers just the 1). The exact same
entry, so all ID numbers right through the db are the same.

The "Stuff it in the database" app will then notify (directally) any
imap servers that have that user registered to them that a new email has
arrived. At this point the "stuff it in the db" app is finished with the
email.

IMAP servers, are pretty similar to whats around now, difference being
that when a connection comes in, it checks with the manager which
database it should connect to for that user as a part of their
authentication. (manager picks servers based on load). If the server
tries to do a query that fails then it will try again on any other
servers that have that users data (keeping in mind that all ID's are
unique). So if a db server dies or goes offline the user doesn't even
notice. What would be nice is if the managment node could direct the
incoming imap connections to the least loaded server (again i would like
to achieve this with all machines in the pool having the same IP and
just ignoring connections they don't need)

The managment node is responsible for load balancing the servers it has
in its pool. Attach a 386, and it'll get 4 users in its database. The
managment node dynamically manages the users. So as users are added and
their usage patterns become established the load can be moved around the
servers. eg

New User johnny.
The system is pretty busy so he gets put on the least loaded server.
Johnny turns out to be a super power user with assloads of searches and
the like.
Managment node moves some of the less intensive users off that server
and onto others.

All this moving happens live as there is (always) 2 copies of the users
data in the databases.

I can see a system like that scaling as far as you could want it to,
Without the need for funkyness in terms of admin. While at the same time
not *requiring* loads of hardware. On an embedded tiny system though it
is going to run slower than dbmail does now. Theres a bunch of stuff
there your average joe isn't going to use.
A "Corperate" mail system though could be setup to be pretty HA and high
performance with just 2 boxes. The scaling being pretty linear and all.

The hardest thing is for all that stuff to work out of the box, The best
way to get people to install it at work is make it easy to install at
home. Thats why I use ubuntu.
apt-get install dbmail-hardcore

BTW wrt threads and the like, I prefer threads = processors * 2 type of
approach. Event driven state machines seem to be the most efficient way
of doing things when you get really loaded. You don't have all that
switching between threads and the overhead of hanging on to them all. To
my mind it should make the coding simpler because once you have a state
machine which will run the imap protocol, it should scale pretty much
linearly without needing to worry too much about IPC and the like.
(dbmail 7.9 perhaps?)

_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Re: [Dbmail] ANNOUNCE: DBMail 2.2.2 released

Reply via email to