RE: Email Archival 101: a General View

William Lefkovics Thu, 20 Nov 2008 10:03:05 -0800

Lots of good information in there.


I certainly don't agree with everything.  Event sinks?  In Exchange 2007,
you would write an archiving transport/routing agent.

 

Small companies often need archiving but do not have a legal department or
binding regulatory needs.  They need a manageable Exchange server so they
are not backing up content daily that isn't accessed very often.  That's the
primary reason I hear for archiving. 

 

>From an Information Week article by Andrew Conry-Murray in June 2008:

 

What to look for in an E-mail archiving solution:

1)      Compression

2)      Full Content Index

3)      Keyword Search

4)      Litigation hold (prevent deletion)

5)      Metadata Index

6)      Retention Deletion Policy enforcement

7)      Single Instancing[WSLIII1] <>  

Other preferred features:

1)      Additional Search

2)      API/Connector to other systems, especially legal apps

3)      Discovery

4)      SharePoint integration

5)      Support for extensive list of attachment types

 

Probably the most valuable thing you said for me, is the last paragraph.
Test your potential solution. MAPI-based and Journaling (ew!) archivers
should be able to be tested without affecting real live data.

 

 

From: Bingham, Kevin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 20, 2008 7:59 AM
To: MS-Exchange Admin Issues
Subject: Email Archival 101: a General View

 

I promised a while back to do a generic write-up on selecting an Email
Archival Solution; figured I better finish this set of scribbles before I
shuffle off from the company next week.  If anyone wants to throw some of
this up on a blog somewhere, feel free.  Since I'm finishing this up in a
rush, there are undoubtedly considerations I've forgotten to include here,
and I only strove to include considerations that would be prevalent to the
majority of companies, but this should be a good start for any company
considering archival.

This is written from the perspective of an Exchange Administrator; Exchange
as your core email solution is assumed, but most of the generalities within
could apply to any email solution.  This information is not definitive nor
unbiased; it only represents the empirical findings of a couple of
administrators.

Email Archival has been a hot point in the industry for some time now, with
no real consensus on best-practices or best-in-breed products.  Different
parts of industry drive this division in views by having different
requirements.  In general, it's a system of pulling email out of its native
storage system and placing it somewhere else; the specifics end there,
though.  So, when considering Email Archival, the first thing you need to do
is define what it means to your company.  Why do you want to do archiving?
>From there, you should be able to work into the second big question: What
features do you need this tool suite to have?

There are four primary reasons to want to do archiving: mailbox size
management; legal regulation; litigation response/rules of procedure;
content management.  Deciding upon your primary driver first is key to being
able to understand your path forward - and how closely entangled your
Archival implementation need to be with your Legal department. (Hint: the
answer is almost always: VERY.)

Mailbox size management: are you nuts? You want to take all that email data
out of a system that is designed to manage email data and stick it somewhere
else, increasing the complexity of the whole system and the number of steps
your users need to actually do anything?  Generally speaking, the tools
within Exchange are sufficient for simple mailbox size management.  If you
need additional space, it is almost always cheaper to simply expand your
Exchange databases/storage groups/servers rather than implement a whole new
system on the side.  With the advent of Exchange 2007, you don't even need
to sustain the same level of disk I/O as previously, so larger, cheaper
disks are an option natively to the email system, rather than with a third
party archiving solution.

Legal regulation: an easy call, relatively speaking.  The requirements of
the system should be laid out and decided for you.  You still need to
discuss with your Legal department what additional aspects need to be
considered.

Litigation response: the most involved scenario for legal requirements
gathering.  Every industry, every business, will have a slightly different
focus.  Heavy involvement with the legal department will be required.  You
need to be prepared to tell them what they have forgotten to consider, or
assumed you knew, or you will find the requirements changing drastically
after implementation.

Content Management: It's litigation response, plus.  Plus everything.  This
is generally for large organizations trying to get a handle on what data
they have, where it is, and how they want it managed.  Like litigation
response, this generally starts with some very vague ideas about the
requirements and a lack of understanding of just how involved the decision
sets can/need to be.

Usually when initially approached about retention periods for email, Legal
Departments will state that you need to keep everything forever, or delete
everything after 30 days.  In some few cases, one of these responses might
be appropriate, but generally, they are both useless.  In the former, you
wind up having so much garbage in the archive that it is impossible to find
anything useful (do you really need to keep the note from your wife from 8
years ago, asking you to pick up a gallon of milk on the way home?) while in
the latter, there is nothing useful left, and the users are upset because
they can't reference the older items, either.  So, retention periods
probably need to be more selective.  You need to determine how you want that
selectivity to occur, though.  Only certain users (ie, executives or
lawyers, or such)?  Only certain folders in a mailbox?  Only certain
content?  Determining how that selectivity needs to occur will be a driving
factor for product selection.  Do you need to guarantee every item is
captured?  Or can you put some responsibility on the user to classify what
must be archived?

 

There are three basic methods by which data might be moved into the archive;
most vendors offer a choice between two of these: MAPI, Event sink, or
journaling.

MAPI will use a standard MAPI login to the mailbox being archived, typically
from a separate application server.  It might be a continuous logon or a
scheduled one; it will have all the overhead of a MAPI connection, plus
whatever code the vendor is using to filter out items for archiving, plus
overhead to remove items (if applicable), plus overhead to insert stubs (if
applicable).  Suffice to say, this might be significant in some
circumstances.

Event Sink runs on the Exchange server, as an extra step during message
processing.  It is more efficient than MAPI and guaranteed to review every
message (MAPI isn't), but can increase the load significantly and possibly
cause delays in mailflow.

Journaling is a built-in Exchange method of copying all mail sent to
mailboxes in a storage group to a different mailbox.  This can be combined
with a MAPI or Event Sink application, which then runs only against the
journal mailbox instead of every mailbox.  Journaling alone may meet some
organizations' archival needs by itself, without a third-party vendor
addition.  It is disk and processor intensive.

 

Retrieval methods also vary greatly from vendor to vendor.  Many vendors
offer multiple methods of data retrieval; what will work in your
environment?

Mailbox retrieval is generally accomplished by leaving a "stub" item in the
user's mailbox.  When the stub is opened, the message is retrieved and
presented to the user.  The method of retrieval, however, can also vary
greatly.  Perhaps the stub is a custom form that needs to be installed in
your Organizational Forms, which makes a call to a web server when open,
which retrieves the data from the archive repository and presents it to the
user in the custom form.  Perhaps it posts a request into an application
mailbox, which a service is continually monitoring and processes, and posts
the retrieved item into the user's mailbox, which then has to be opened.
Perhaps opening the item executes an Outlook add-in which fetches the item
from an archive itself.  There are many ways to implement stub retrieval,
all of which have different implications for supportability, load balancing,
and fault tolerance.

A fat client is simply an installed application on the desktop, which allows
users to access, search, and sometimes manage the archive, directly.

A web interface should be similar to a fat client, but would be hosted as a
web page somewhere, with the application doing the work there.

Security is a strong concern in some places, not so much in others.  How
does the solution prevent users from retrieving each other's data?  Is there
a way to allow a user to access someone else's data, intentionally?  Does
the archive maintain its own security model, or is it integrated with Active
Directory or other security provider?  If it is integrated, does that mean
it synchronizes a copy and maintains it own, or does it make security calls
against that directory directly?  How is integrity of the archive (ie, are
users allowed to delete things from it or not?) guaranteed?

Integration with other data sources can be a concern for Content Management
implementations, but might be for other implementation reasons also - and it
never hurts to consider the future (will you ever have need for Content
Management?)  A Content Management initiative will often include - either
currently or when you turn your back a month after implementation - other
data sources as well, such as file servers, SharePoint, or some other
databases.  If so, does the solution have an integrated answer for all
platforms?  You may sacrifice some best-in-breed features by going with a
single vendor for all sources, but you will probably gain cost savings and a
single method of retrieval/search/whatever for all data. which is usually
sort of the point (or one of the points) of a Content Management initiative.


Topology considerations will be insignificant for small companies, but of
the utmost importance for geographically disparate ones.  Where is data
stored - single point or multiple locations?  Does the application run in
multiple places, or just one?  How does the storage function work over the
WAN?  How does the retrieval work over the WAN?  If there are multiple
repositories, how do they communicate with each other and how do referrals
to other repositories occur, if at all?

Every policy/feature consideration probably has a technical one to go with
it - which you can bet the archive vendors probably won't tell you.  For
instance, leaving stub items in the mailbox is a great usability feature,
but one of the tradeoffs is possible performance - it's not the size of your
database that primarily drives performance in Exchange, but rather the
number of items; leaving stubs does nothing to reduce number of items and
will, in fact, swiftly increase it over time.

Offline access is completely unimportant for some companies, but considered
essential at others.  Does the solution have any sort of offline cache for
traveling users?  If so, how does the cache operate - how is it populated,
synchronized, encrypted?  Is there a size cap?  Does its existence on a
laptop violate any of the drivers the Legal Department is pushing in order
to run the project in the first place?  For instance, if the vendor is just
using their own PST to provide an offline archive, you can run into a 2GB
space limitation on a file that is weakly encrypted as best, and if a
primary driver is to remove PSTs from your environment, this may not be a
viable offline solution for you.

Finally, Pilot The Solution.  Do NOT pick a vendor just from discussions,
data and presentations.  Get Your Hands Dirty.  My old company issued RFPs
to eight companies and brought three in for testing.  Some things came out
in testing that - though probably just fine for other companies' needs -
would have left us very unhappy if we'd just gone with the vendor who seemed
to fit the best from the RFPs.

 




  _____  



 

This e-mail is intended for the use of the addressee(s) only and may contain
privileged, confidential, or proprietary information that is exempt from
disclosure under law. If you have received this message in error, please
inform us promptly by reply e-mail, then delete the e-mail and destroy any
printed copy. Thank you. 

 



  _____  



 

 

 

  _____  

 [WSLIII1] <> Verbing weirds language.


~ Ninja Email Security with Cloudmark Spam Engine Gets Image Spam ~
~             http://www.sunbeltsoftware.com/Ninja                ~

RE: Email Archival 101: a General View

Reply via email to