Hi List, Following the recent discussion about OpenChange Backup Tools, I've started to write a draft for the openchangemapidump tool. The objective with this first attempt is to gather as much information as we need in order to design the first reliable implementation. I should be able to push preliminary test code soon.
1. Introduction
================
1a. What is openchangemapidump?
It is a tool designed to dump a user Mailbox store at
object-level from Exchange using MAPI. The tool needs to
be designed so data can be restored, inspected,
migrated, browsed or searched easily.
1b. Links to underlying ideas and concepts:
http://wiki.openchange.org/index.php/OpenChangeForMAPIStoreBackup
1c. New links about MAPI ID uniqueness and why we should use
PR_SOURCE_KEY rather than PR_ENTRYID:
http://www.tech-archive.net/Archive/Development/microsoft.public.win32.programmer.messaging/2006-10/msg00090.html
http://support.microsoft.com/default.aspx?scid=kb;en-us;230749
http://support.microsoft.com/kb/231160
2. Design and Architecture
==========================
2a. Storage:
* MAPI object hierarchy (container, items)
Accessing a particular MAPI object is mostly
about opening containers until we access the
desired element. If we intend to provide an easy
MAPI store inspector/walker/restore tool, we
should provide such hierarchy. This is the main
reason why we won't be using mysql for object
storage but LDAP-like database (here ldb).
* LDB limitations and SQL backend for raw content
Nevertheless, LDB is not designed to handle big
DATA blobs. This means that we shouldn't be
using ldb when storing large emails content
(PR_BODY, PR_HTML, PR_RTF_COMPRESSED,
attachments) but a SQL database (sqlite).
* Preliminary conclusions:
We will divide the backend storage in 2
different parts:
1. LDB database: store object hierarchy
with common properties, associated
values and links to SQL database entries
when we have large data blobs.
2. SQL database: store object raw data
This solution should normally take advantage of
each layer for specific purposes:
- Walk MAPI store tree and search
entries (LDB)
- Offline searches on content and large
DATA blob storage (SQL)
I'll certainly push the database model (ldb +
sql) on the Wiki.
2b. Namespace for LDB records
* MAPI EntryID uniqueness
This is the *only* parameter Microsoft
guarantees to be unique. As discussed on the
Wiki and referenced on Microsoft website, unique
doesn't mean permanent. We can anyway rely on
PR_SOURCE_KEY (22 bytes SBinary struct) which
first 16 bytes are the MAPI store GUID and last
6 bytes the unique object UID. We should
normally be able next to build PR_ENTRYID from
message and container's folder PR_SOURCE_KEY.
2c. Checksum algorithm rather than custom property
Since MAPI IDs (PR_ENTRYID or PR_SOURCE_KEY) are not
permanent all along the object lifetime, we can't trust
them while doing updates. One of the solution used in
some other Exchange backup tools seems to add a custom
property (backup tool object UUID) and store it on
Exchange server. We'll rather use a checksum algorithm
on specific properties which garantee message
uniqueness. This data will only be stored on the client
side. The algorithm and properties involved in the
process still need to be defined, but openchangemapidump
draft once pushed on SVN should provide an environment
to test possible implementations.
2c. Snapshot backup
openchangemapidump should first provide a snapshot
backup of the Exchange mailbox store at a given point in
time. The issue with large mailbox is how can we prevent
from modifications to occur in a folder already backuped
during the process? A possible solution which needs to
be investigated would be to use notifications: monitor
changes at folder level, update database if changed
occurred.
3. Limitations and Possible ways to fix them
============================================
3a. Backup speed process needs improvements
For the moment, the backup process is really slow,
mostly due to other priority at this stage. I'm anyway
confident we can improve fastness. Considerations below
need to be tested but these are maybe ways we can
explore:
- Rather than using GetPropsAll, we can maybe use
GetPropList, filter properties to the smallest set we
need to restore object or perform checksum operations
correctly + GetProps. Idea: 2 calls with smallest
content rather than a single one with content we don't
need.
- We can delay large DATA blobs fetch to the latest part
of the process (large content + attachment):
1. Create the object-level hierarchy
2. Filter items from ldb database and access
them using their PR_ENTRYID.
This method provides an easy way to track how much of
the process is completed (computing PR_MESSAGE_SIZE for
example) + possibly add a callback.
3b. Reliable update process:
This really needs to be discussed:
- We can split the backup process in 3 stages:
1. Check for each container whether the
number of items (PR_CONTENT_COUNT) or
the total size have changed.
2. Check if objects are still accessible
through their PR_ENTRYID.
3. Update/Modify if necessary.
While some may say it is fuzzing, I think this approach
may be improved until some point where we can consider
it reliable.
4. What the code currently does?
================================
So far, I've written a very basic sample implementation:
- It recursively browses the mailbox; starting at Top
Information Store and recursively entering subfolders (some kind
of hacked openchangeclient -m) until it access items.
- The code creates and populates a LDB database, providing
LDAP-like hierarchy, and dump each containers and items with all
their properties within the database (even content
PR_RTF_COMPRESSED etc. but no attachments)
- Finally the current code provides trivial database update
(only new items but no checksum calculation).
Before I push the code on the SVN, I need:
- clean-up the code and fix numerous memory leaks
- Add storage support for multi-valued properties and
generates a LDIF schema file to handle isSingleValued.
- Add some skeleton files for further use (checksum
algorithm etc.)
- add sample sqlite backend implementation for content
storage.
Cheers,
Julien.
--
Julien Kerihuel
[EMAIL PROTECTED]
OpenChange Project Manager
GPG Fingerprint: 0B55 783D A781 6329 108A B609 7EF6 FE11 A35F 1F79
signature.asc
Description: This is a digitally signed message part
_______________________________________________ devel mailing list [email protected] http://mailman.openchange.org/listinfo/devel
