[arc-discuss] Redacted case materials problem REVISITED

Alan Burlison Thu, 01 Oct 2009 19:26:32 +0100

As I'm sure everyone can understand, the need to protect Sun 
confidential information has not changed.  The objective has always been 
to protect Sun's IP by trying to ensure that we are not releasing 
confidential information.  Because the final checks are automated, there 
will always be a potential for false positives.


The code currently contains two sets of triggers, the following list of 
specific triggers (regular expressions that are all case-insensitive):

     company\s+conf[aei]denti?al
     contains?\s+conf[aei]denti?al\s+and\s+priv[ei]l[ei]ged
     engineering\s+only

And a set of triggers that looks for:

     Sun\b(?:\s+Micro(?:systems)?\b)?,?(?:\s+:Inc(?:orporated)?

followed within 20 characters by one of:

     conf[aei]denti?al
     priv[ie]leged
     propri[ae]t[ao]ry

The older script looked for the (case-insensitive) strings 'Sun 
Proprietary' or 'Sun Confidential'.  Whilst the new patterns undoubtedly 
match more frequently than the old ones, they are largely equivalent. 
Also, it is clear that both "Sun Confidential" and "Sun  Confidential" 
mean the same thing, yet the old script would have flagged the first 
phrase (one space between "Sun" and "Confidential") and ignored the 
second (two spaces between "Sun" and "Confidential").

As requested, we are releasing this information to make it easier for 
people to avoid inadvertently triggering the checks.  I'll also update 
the REDACTED.txt file so that it includes the phrase that triggered, 
that will make it easier to figure out why a particular case has been 
redacted.

-- 
Alan Burlison
--

[arc-discuss] Redacted case materials problem REVISITED

Reply via email to