Re: Migrated records should be identified

Daniel Gruno Thu, 10 Sep 2020 05:47:09 -0700

On 10/09/2020 14.44, sebb wrote:

On Thu, 10 Sep 2020 at 13:23, Daniel Gruno <[email protected]> wrote:


On 10/09/2020 14.15, sebb wrote:

On Thu, 10 Sep 2020 at 12:32, Daniel Gruno <[email protected]> wrote:


On 10/09/2020 13.25, sebb wrote:

Migration to Foal will be a huge job for some installations.

Whilst hopefully all snags will have been ironed out of any conversion
tool before it is deployed in earnest, it's possible that some edge
cases will cause issues, and will need subsequent adjustment.


Short of ironing out a standard for DKIM_ID, the migration tests I've
done have gone relatively well. There were IIRC a few snags, most
related to the ES 7.8.1 lib, but once I got migration started, it worked
as intended and everything on the new ES server was compatible. If we
could somehow get a migration test running on travis or such, that would
be ideal - but that is quite tricky - we'd have to maybe dockerize two
containers - one with old pony, one with foal, and then test migrating
across and checking that each document is obtainable.


What tests are planned for checking migration?


To this end, I think it will be essential to know which records have
been migrated, and which version of the software was used to do so (as
well as the date).

It may be worth including version and timestamp info in the direct
archive and imports as well.


Do you mean adding a key/value to the migrated doc with a migration
note? That wouldn't be a bad idea, if nothing else, to keep score of
what was migrated and what's new.


Something like that.

I think the data needs to be flexible and allow for multiple notes.
It won't always be sufficient to record the last change to the data.


Yes, one wondrous thing about ES is a text field can be both text or an
array of texts, so you can have one note or multiple notes, and it'll
just work. I'm thinking of just having a "notes" field where we can put
entries.


Does that automatically append new entries, or does the user have to
amend the record to ensure previous entries are not lost?

What I do right now is fetch the doc, ensure 'notes' is a list, thenappend new notes to it and save the entire doc.


It would probably still be useful to have some fixed attributes such as
-archived-at
-imported-at


That would be for archiver.py and import-mbox.py?


One possible application would be to back-fill attachments which were
originally ignored.


This could be run as a background re-indexer perhaps? That grabs the
source document, re-parses attachments, and if it contained more than
originally thought, add them and update the email document.


Yes, and marks the document somehow so it does not need to be scanned again.

This is where the change context comes in.
If we knew which documents were created with which version of
software, it would be possible to know which ones did not need
processing.

S.

Re: Migrated records should be identified

Reply via email to