Thanks Geert and David for your valuable suggestions. Will check out Flexrep and CPF.
> On Mar 24, 2015, at 2:17 AM, David Lee <[email protected]> wrote: > > Unless your document's PI data is separated into different documents you are > going to need to do a custom transformation on each document - the details of > which are very case specific (fill in SS#'s with '???' remove last names ? > remove entire sections or replace with sample data ?). Having worked in the > Medical and commerce worlds I know getting this right, and clearly auditable > are crucial. > Also consider if you need to maintain any document properties or metadata > (properties objects including mod dates, collections, permissions , DLS data > etc., > and are these copied as-is or modified) > > That refines the question into parts > 1) Selecting the document subset to copy > 2) Transforming the document content itself (*prior* to leaving the 'trust > zone') > 3) Select/copy/filter the document metadata > 4) Extract from the source DB > 5) -- possibly package for secure, reliable or easy travel to the down sites, > encrypt? > 6) -- Copy the data > .... > Now reverse the process on the target site. > > You can do all this ad-hoc - once maybe > Getting this reliable, scriptable, auditable and not screw up ever -- harder. > > Greet's suggestion of FlexRep seems ideal for this as it can accomplish All > of these. > > MLCP by itself can do quite a bit - but it may be hard to put all the pieces > together. > > Another way is making a temporary DB, and using CPF or your own code to do > all the data transformation on-server then (1-4) then use any number of ways > to copy the data (mlcp, replication, database export/import ) > > Or ... if you prefer offline tools (say you like xproc or xmlsh or other > non-server products) you could dump the DB to local files, clean them in in > place, > then copy them over and reverse it. > > FlexRep is looking really good though ... > > > ----------------------------------------------------------------------------- > David Lee > Lead Engineer > MarkLogic Corporation > [email protected] > Phone: +1 812-482-5224 > Cell: +1 812-630-7622 > www.marklogic.com > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Geert Josten > Sent: Tuesday, March 24, 2015 2:00 AM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Suggestions for data masking > > Hi Joel, > > I haven¹t dealt with this personally, but could ask around. I guess though > there are numerous ways to go about with this, depending on the exact needs. > The two that come to mind first: > > You could create a permanent solution using Flexible Replication, which > builds on top of CPF: > http://docs.marklogic.com/guide/flexrep/rep_intro#id_62963 > > You could also use MLCP copying feature together with an MLCP transform. > > You already mentioned triggers and scheduled tasks, but MLCP will load faster > I think. CPF uses triggers underneath.. > > Kind regards, > Geert > > On 3/24/15, 2:12 AM, "Joel Wilson Gunasekaran" > <[email protected]> wrote: > >> Hi, >> >> Once in a while, we refresh dataset in lower environments with >> production data for testing purposes. >> We have a requirement to mask all pii(personally identifiable >> information) data like email id, phone number, etc. in lower >> environments like DEV, QA. >> >> We were thinking about having a one-time script that does the masking, >> which can be run when we do the data refresh. >> In addition to this, we also want a automated process that does this, >> like either a scheduled task or a trigger, to avoid any sensitive data >> left unmasked, accidentally. >> >> Can you please let me know if you have had to deal with similar cases >> and any suggestions? >> >> Thanks >> Joel >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
