The number of occurrences is certainly a good idea. For the HDFS and any future data sources which don't have native schema, how do we handle these fields which are defined in an external system ? Are you proposing to add a schema abstraction as well ?
thanks Prasad On Tue, Jan 12, 2016 at 11:49 PM, Edward Zhang <[email protected]> wrote: > Hi Daniel, > > That is great idea to add more meaningful fields into sensitivity metadata, > you can go ahead to design/add that. > > Only one concern is : how do we name this field generally? and what else is > possible for future. numOfOccurrences could be a good name, for hdfs or > hive, the occurrence is defined differently. > > Thanks > Edward > > On Mon, Jan 11, 2016 at 7:38 PM, Daniel Zhou <[email protected]> > wrote: > > > Hi all, > > > > Recently I am working on a project to automatically fetch the metadata of > > sensitive info stored in DB and then create eagle policy. I am wondering > if > > we can add a field called "threshold" to current "fileSensitivity > > structure" in eagle so that we can create a policy with more details. > > > > Our company's product "DgSecure" can discover all the sensitive elements > > within every file in hadoop automatically, so we have many details of > > these sensitive information. With these information, we can make the > policy > > more precisely. For example, I want to create a policy based on two > > parameters, one is "sensitivity type", the other is called "threshold". > > Only when the total number of that particular sensitive type element > > reaches or exceeds "threshold" can the alerts be triggered. > > > > So the trigger condition could be something like this: > > ........ if (sensitiveType == "MailAddress" && NumberOfSensData > > >=threshodl) ..... > > > > I think this condition makes more sense than just tagging a file with a > > sensitive type. > > > > Please let me know if you have any opinions or suggestions. :) > > > > Thanks! > > Daniel > > >
