problem with this approach is that if our data structure is nested in a json, it is hard to be used for policy creation. Today looks Siddhi CEP engine only supports flat structure unless we customize some comparison function for this json field.
Thanks Edward On 1/13/16, 14:13, "Daniel Zhou" <[email protected]> wrote: >How about we define some common fields and then add a String field to >save a JSON file, which can be used to save any special fields defined in >external system. Eagle only needs to take care of the common fields and >external system can save any data they want in that JSON string field. > >Regards, >Daniel > >-----Original Message----- >From: Zhang, Edward (GDI Hadoop) [mailto:[email protected]] >Sent: Wednesday, January 13, 2016 1:18 PM >To: [email protected] >Subject: Re: suggestion: add field "threshold" to current >"fileSensitivity structure" in Eagle > >Yes, looks we need a schema abstraction which can represent any >sensitivity information. >sensitivityType and numOfOccurrences are just two common fields of the >whole sensitivity information. > >For hdfs, the sensitivity information also includes filedir, while for >hive, the sensitivity information includes hiveResource, which could be >database, table, column etc. > >Thanks >Edward > >On 1/13/16, 0:35, "Prasad Mujumdar" <[email protected]> wrote: > >> The number of occurrences is certainly a good idea. >> For the HDFS and any future data sources which don't have native >>schema, how do we handle these fields which are defined in an external >>system ? >>Are >>you proposing to add a schema abstraction as well ? >> >>thanks >>Prasad >> >> >>On Tue, Jan 12, 2016 at 11:49 PM, Edward Zhang >><[email protected]> >>wrote: >> >>> Hi Daniel, >>> >>> That is great idea to add more meaningful fields into sensitivity >>>metadata, you can go ahead to design/add that. >>> >>> Only one concern is : how do we name this field generally? and what >>>else is possible for future. numOfOccurrences could be a good name, >>>for hdfs or hive, the occurrence is defined differently. >>> >>> Thanks >>> Edward >>> >>> On Mon, Jan 11, 2016 at 7:38 PM, Daniel Zhou >>> <[email protected]> >>> wrote: >>> >>> > Hi all, >>> > >>> > Recently I am working on a project to automatically fetch the >>>metadata of >>> > sensitive info stored in DB and then create eagle policy. I am >>>wondering >>> if >>> > we can add a field called "threshold" to current "fileSensitivity >>> > structure" in eagle so that we can create a policy with more details. >>> > >>> > Our company's product "DgSecure" can discover all the sensitive >>>elements >>> > within every file in hadoop automatically, so we have many >>> > details >>>of >>> > these sensitive information. With these information, we can make >>> > the >>> policy >>> > more precisely. For example, I want to create a policy based on >>> > two parameters, one is "sensitivity type", the other is called >>>"threshold". >>> > Only when the total number of that particular sensitive type >>> > element reaches or exceeds "threshold" can the alerts be triggered. >>> > >>> > So the trigger condition could be something like this: >>> > ........ if (sensitiveType == "MailAddress" && NumberOfSensData >>> > >=threshodl) ..... >>> > >>> > I think this condition makes more sense than just tagging a file >>> > with >>>a >>> > sensitive type. >>> > >>> > Please let me know if you have any opinions or suggestions. :) >>> > >>> > Thanks! >>> > Daniel >>> > >>> >
