--On Tuesday, June 06, 2006 9:37 AM +0200 Emmanuel Lecharny
<[EMAIL PROTECTED]> wrote:
Hi guys !
Quanah Gibson-Mount a écrit :
I think it is important to allow specification of what indices to use
for a given attribute for a few reasons. One, that you can use it to
actually make some searches slow enough to hinder efforts (like we
have a spam troller routinely trying to get data from our sources that
is fairly obnoxious),
In my mind, it's pretty much a security issue. You can add an
authentication to avoid such behavior, or, if your data are public, then
you have no reason to slow down the searches. Limiting the number of
results may be more efficient. Btw, this is a real problem for a server,
and something we sqhoudl consider : how to avoid DOS on a LDAP server
(either by flooding, or with malformed requests, or with huge data). We
still have to address those attacks. At this point, I may have a question
: is it frequent usage for Ldap server to be exposed outside a company?
Generally speaking, I never saw that. User data are really supposed to be
private and not accessible from unidentified user. I may be totally
wrong, but if I see a Ldap Server exposed to the world - never saw that
for years -, the first thing I would ask the Admins is to close the door
of their system. Just my opinion.
Well, I see ldap servers expose data to the world all the time. Pretty
much any university I send random queries to does so. @ Stanford, we allow
users to affect the "visibility" of their data, with 3 settings:
"world" -- Avaliable to anyone, including anonymous
"stanford" -- Available only to those people who have authenticated as
being from Stanford
"private" -- Not visible to anyone by normal means (specific applications
get by this)
Since there is a fair amount of data then available to anyone who wants to
run a query because of policy, I do try my best to do due diligence and cut
down on spam harvesting runs. We do have a result limit on the server, but
the people I've run across are savvy enough to use batched queries of
different ranges to effectively get around that in at least part.
People also like to be able to use their email clients to get information
from the directory servers, and very few of them (only one that I've found)
support SASL/GSSAPI binds, which is the only authentication method we allow
(no username/password).
another is that the more indices you have on an attribute, the larger
the total database is, and the longer it takes to load. This of
course depends on part in the OS/Cpu used as well. For example, I
currently index 90 attributes in my database to varying degrees (most
are eq, which is a fairly minimal index). On my Solaris sparc
systems, it takes 2.5ish hours to load the database. On my new AMD
systems that'll be replacing the Sun Sparc boxes, it takes all of 14.5
minutes. However, if all 90 of those attributes were getting indexed
pres,eq,sub, the amount of time to load would increase significantly.
well, in production, loading a server ris not something you do very
often. You may need to restore a crashed database, or reload a database
which structure has change, but this is definitively not a real concern.
Load once, use many.
I think that's a good thought in theory, and is what I thought too.
However, I run 4 environments (dev, test, uat, and production). We have a
custom schema that we modify a few times a year, and those modifications
are usually large enough to warrant a complete reload of the data that is
generated from our RDBMS for the ldap servers. As a part of that process,
dev may be reloaded several times as bugs are fixed, etc, and the same goes
for test. So I actually reload my servers a bit. ;)
Currently, my indices take up 1.1GB of disk space in OpenLDAP (I'm not
sure how that exactly map out in Apache DS). My database entry file
takes 2.7GB. So my indices are approximately 1/3 of my database size.
3Gb is really nothing. A 15K Rpm SCSI disk is now 36 Gb minimum and cost
aroung 200$. Not a big deal. Better spend money of memory sticks rather
that on high performance disks :)
I don't want to say that making it possible to select indices is *bad*,
but, IMHO, this may be a cool feature that is a little bit overkilling,
when you balance it with real usages. For real RDBMS, having twice the
size on disk for indices is considered plain normal. I don't think we
should go that far, but when you choose to set indices on an attribute,
this may not be very important to offer a choice on which kind of indices
you want.
Yeah, my concerns here may be more specific to OpenLDAP and the use of BDB.
When bulk loading, it is quickest to have enough BDB cache as the entire
size of your database (3.8GB in the case above). On Solaris SPARC, I found
that the only good way to get performance was to use a shared memory region
(Linux doesn't require that), which means that I have to have as much
memory available as BDB cache on the system, and memory is sadly not so
cheap as disk.
--Quanah
--
Quanah Gibson-Mount
Principal Software Developer
ITS/Shared Application Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html