Hi Arun

This looks really good and fills some obvious gaps in the security landscape.

Happy to contribute anyway you want.

All the best!!!

Bosco





On 10/20/15, 8:02 AM, "Alex Karasulu" <akaras...@gmail.com on behalf of 
akaras...@apache.org> wrote:

>Hi Arun,
>
>Eagle sounds very promising. I just had a discussion with someone about
>this exact need. I do however agree with Greg on the name. As far as I can
>see, besides the name, your weakest point is the all eBay employed team.
>It's not a blocker and can be fixed during incubation. Good luck to you.
>
>Alex
>
>
>On Tue, Oct 20, 2015 at 5:51 PM, Manoharan, Arun <armanoha...@ebay.com>
>wrote:
>
>> Hi Greg,
>>
>> Thank you for reviewing the proposal.
>>
>> Originally we thought Eagle might be trademarked by someone already but I
>> went thru eBay legal team to get the clearance for the name to be used. We
>> will look into it again to see if there will be potential problems.
>>
>> Thanks,
>> Arun
>>
>> On 10/20/15, 1:52 AM, "Greg Stein" <gst...@gmail.com> wrote:
>>
>> >Hey there, Arun! ... I have no commentary on the proposal itself, as it
>> >looks like a great proposal. I would suggest being a bit wary of the name,
>> >as "Eagle" is a *very* popular PCB design program.
>> >
>> >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun <armanoha...@ebay.com>
>> >wrote:
>> >
>> >> Hello Everyone,
>> >>
>> >> My name is Arun Manoharan. Currently a product manager in the Analytics
>> >> platform team at eBay Inc.
>> >>
>> >> I would like to start a discussion on Eagle and its joining the ASF as
>> >>an
>> >> incubation project.
>> >>
>> >> Eagle is a Monitoring solution for Hadoop to instantly identify access
>> >>to
>> >> sensitive data, recognize attacks, malicious activities and take
>> >>actions in
>> >> real time. Eagle supports a wide variety of policies on HDFS data and
>> >>Hive.
>> >> Eagle also provides machine learning models for detecting anomalous user
>> >> behavior in Hadoop.
>> >>
>> >> The proposal is available on the wiki here:
>> >> https://wiki.apache.org/incubator/EagleProposal
>> >>
>> >> The text of the proposal is also available at the end of this email.
>> >>
>> >> Thanks for your time and help.
>> >>
>> >> Thanks,
>> >> Arun
>> >>
>> >> <COPY of the proposal in text format>
>> >>
>> >> Eagle
>> >>
>> >> Abstract
>> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>> >> identify access to sensitive data, recognize attacks, malicious
>> >>activities
>> >> in hadoop and take actions.
>> >>
>> >> Proposal
>> >> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>> >> enforces policies defined on sensitive data access and alerts or blocks
>> >> user¹s access to that sensitive data in real time. Eagle also creates
>> >>user
>> >> profiles based on the typical access behaviour for HDFS and Hive and
>> >>sends
>> >> alerts when anomalous behaviour is detected. Eagle can also import
>> >> sensitive data information classified by external classification
>> >>engines to
>> >> help define its policies.
>> >>
>> >> Overview of Eagle
>> >> Eagle has 3 main parts.
>> >> 1.Data collection and storage - Eagle collects data from various hadoop
>> >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>> >>storage.
>> >> 2.Data processing and policy engine - Eagle allows users to create
>> >> policies based on various metadata properties on HDFS, Hive and HBase
>> >>data.
>> >> 3.Eagle services - Eagle services include policy manager, query service
>> >> and the visualization component. Eagle provides intuitive user
>> >>interface to
>> >> administer Eagle and an alert dashboard to respond to real time alerts.
>> >>
>> >> Data Collection and Storage:
>> >> Eagle provides programming API for extending Eagle to integrate any data
>> >> source into Eagle policy evaluation framework. For example, Eagle hdfs
>> >> audit monitoring collects data from Kafka which is populated from
>> >>namenode
>> >> log4j appender or from logstash agent. Eagle hive monitoring collects
>> >>hive
>> >> query logs from running job through YARN API, which is designed to be
>> >> scalable and fault-tolerant. Eagle uses HBase as storage for storing
>> >> metadata and metrics data, and also supports relational database through
>> >> configuration change.
>> >>
>> >> Data Processing and Policy Engine:
>> >> Processing Engine: Eagle provides stream processing API which is an
>> >> abstraction of Apache Storm. It can also be extended to other streaming
>> >> engines. This abstraction allows developers to assemble data
>> >> transformation, filtering, external data join etc. without physically
>> >>bound
>> >> to a specific streaming platform. Eagle streaming API allows developers
>> >>to
>> >> easily integrate business logic with Eagle policy engine and internally
>> >> Eagle framework compiles business logic execution DAG into program
>> >> primitives of underlying stream infrastructure e.g. Apache Storm. For
>> >> example, Eagle HDFS monitoring transforms audit log from Namenode to
>> >>object
>> >> and joins sensitivity metadata, security zone metadata which are
>> >>generated
>> >> from external programs or configured by user. Eagle hive monitoring
>> >>filters
>> >> running jobs to get hive query string and parses query string into
>> >>object
>> >> and then joins sensitivity metadata.
>> >> Alerting Framework: Eagle Alert Framework includes stream metadata API,
>> >> scalable policy engine framework, extensible policy engine framework.
>> >> Stream metadata API allows developers to declare event schema including
>> >> what attributes constitute an event, what is the type for each
>> >>attribute,
>> >> and how to dynamically resolve attribute value in runtime when user
>> >> configures policy. Scalable policy engine framework allows policies to
>> >>be
>> >> executed on different physical nodes in parallel. It is also used to
>> >>define
>> >> your own policy partitioner class. Policy engine framework together with
>> >> streaming partitioning capability provided by all streaming platforms
>> >>will
>> >> make sure policies and events can be evaluated in a fully distributed
>> >>way.
>> >> Extensible policy engine framework allows developer to plugin a new
>> >>policy
>> >> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
>> >> engine which Eagle supports as first-class citizen.
>> >> Machine Learning module: Eagle provides capabilities to define user
>> >> activity patterns or user profiles for Hadoop users based on the user
>> >> behaviour in the platform. These user profiles are modeled using Machine
>> >> Learning algorithms and used for detection of anomalous users
>> >>activities.
>> >> Eagle uses Eigen Value Decomposition, and Density Estimation algorithms
>> >>for
>> >> generating user profile models. The model reads data from HDFS audit
>> >>logs,
>> >> preprocesses and aggregates data, and generates models using Spark
>> >> programming APIs. Once models are generated, Eagle uses stream
>> >>processing
>> >> engine for near real-time anomaly detection to determine if any user¹s
>> >> activities are suspicious or not.
>> >>
>> >> Eagle Services:
>> >> Query Service: Eagle provides SQL-like service API to support
>> >> comprehensive computation for huge set of data on the fly, for e.g.
>> >> comprehensive filtering, aggregation, histogram, sorting, top,
>> >>arithmetical
>> >> expression, pagination etc. HBase is the data storage which Eagle
>> >>supports
>> >> as first-class citizen, relational database is supported as well. For
>> >>HBase
>> >> storage, Eagle query framework compiles user provided SQL-like query
>> >>into
>> >> HBase native filter objects and execute it through HBase coprocessor on
>> >>the
>> >> fly.
>> >> Policy Manager: Eagle policy manager provides UI and Restful API for
>> >>user
>> >> to define policy with just a few clicks. It includes site management UI,
>> >> policy editor, sensitivity metadata import, HDFS or Hive sensitive
>> >>resource
>> >> browsing, alert dashboards etc.
>> >> Background
>> >> Data is one of the most important assets for today¹s businesses, which
>> >> makes data security one of the top priorities of today¹s enterprises.
>> >> Hadoop is widely used across different verticals as a big data
>> >>repository
>> >> to store this data in most modern enterprises.
>> >> At eBay we use hadoop platform extensively for our data processing
>> >>needs.
>> >> Our data in Hadoop is becoming bigger and bigger as our user base is
>> >>seeing
>> >> an exponential growth. Today there are variety of data sets available in
>> >> Hadoop cluster for our users to consume. eBay has around 120 PB of data
>> >> stored in HDFS across 6 different clusters and around 1800+ active
>> >>hadoop
>> >> users consuming data thru Hive, HBase and mapreduce jobs everyday to
>> >>build
>> >> applications using this data. With this astronomical growth of data
>> >>there
>> >> are also challenges in securing sensitive data and monitoring the
>> >>access to
>> >> this sensitive data. Today in large organizations HDFS is the defacto
>> >> standard for storing big data. Data sets which includes and not limited
>> >>to
>> >> consumer sentiment, social media data, customer segmentation, web
>> >>clicks,
>> >> sensor data, geo-location and transaction data get stored in Hadoop for
>> >>day
>> >> to day business needs.
>> >> We at eBay want to make sure the sensitive data and data platforms are
>> >> completely protected from security breaches. So we partnered very
>> >>closely
>> >> with our Information Security team to understand the requirements for
>> >>Eagle
>> >> to monitor sensitive data access on hadoop:
>> >> 1.Ability to identify and stop security threats in real time
>> >> 2.Scale for big data (Support PB scale and Billions of events)
>> >> 3.Ability to create data access policies
>> >> 4.Support multiple data sources like HDFS, HBase, Hive
>> >> 5.Visualize alerts in real time
>> >> 6.Ability to block malicious access in real time
>> >> We did not find any data access monitoring solution that available today
>> >> and can provide the features and functionality that we need to monitor
>> >>the
>> >> data access in the hadoop ecosystem at our scale. Hence with an
>> >>excellent
>> >> team of world class developers and several users, we have been able to
>> >> bring Eagle into production as well as open source it.
>> >>
>> >> Rationale
>> >> In today¹s world; data is an important asset for any company. Businesses
>> >> are using data extensively to create amazing experiences for users. Data
>> >> has to be protected and access to data should be secured from security
>> >> breaches. Today Hadoop is not only used to store logs but also stores
>> >> financial data, sensitive data sets, geographical data, user click
>> >>stream
>> >> data sets etc. which makes it more important to be protected from
>> >>security
>> >> breaches. To secure a data platform there are multiple things that need
>> >>to
>> >> happen. One is having a strong access control mechanism which today is
>> >> provided by Apache Ranger and Apache Sentry. These tools provide the
>> >> ability to provide fine grain access control mechanism to data sets on
>> >> hadoop. But there is a big gap in terms of monitoring all the data
>> >>access
>> >> events and activities in order to securing the hadoop data platform.
>> >> Together with strong access control, perimeter security and data access
>> >> monitoring in place data in the hadoop clusters can be secured against
>> >> breaches. We looked around and found following:
>> >> Existing data activity monitoring products are designed for traditional
>> >> databases and data warehouse. Existing monitoring platforms cannot scale
>> >> out to support fast growing data and petabyte scale. Few products in the
>> >> industry are still very early in terms of supporting HDFS, Hive, HBase
>> >>data
>> >> access monitoring.
>> >> As mentioned in the background, the business requirement and urgency to
>> >> secure the data from users with malicious intent drove eBay to invest in
>> >> building a real time data access monitoring solution from scratch to
>> >>offer
>> >> real time alerts and remediation features for malicious data access.
>> >> With the power of open source distributed systems like Hadoop, Kafka and
>> >> much more we were able to develop a data activity monitoring system that
>> >> can scale, identify and stop malicious access in real time.
>> >> Eagle allows admins to create standard access policies and rules for
>> >> monitoring HDFS, Hive and HBase data. Eagle also provides out of box
>> >> machine learning models for modeling user profiles based on user access
>> >> behaviour and use the model to alert on anomalies.
>> >>
>> >> Current Status
>> >>
>> >> Meritocracy
>> >> Eagle has been deployed in production at eBay for monitoring billions of
>> >> events per day from HDFS and Hive operations. From the start; the
>> >>product
>> >> has been built with focus on high scalability and application
>> >>extensibility
>> >> in mind and Eagle has demonstrated great performance in responding to
>> >> suspicious events instantly and great flexibility in defining policy.
>> >>
>> >> Community
>> >> Eagle seeks to develop the developer and user communities during
>> >> incubation.
>> >>
>> >> Core Developers
>> >> Eagle is currently being designed and developed by engineers from eBay
>> >> Inc. ­ Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin Jiang,
>> >> Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All of
>> >> these core developers have deep expertise in developing monitoring
>> >>products
>> >> for the Hadoop ecosystem.
>> >>
>> >> Alignment
>> >> The ASF is a natural host for Eagle given that it is already the home of
>> >> Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data
>> >> projects. Eagle leverages lot of Apache open-source products. Eagle was
>> >> designed to offer real time insights into sensitive data access by
>> >>actively
>> >> monitoring the data access on various data sets in hadoop and an
>> >>extensible
>> >> alerting framework with a powerful policy engine. Eagle compliments the
>> >> existing Hadoop platform area by providing a comprehensive monitoring
>> >>and
>> >> alerting solution for detecting sensitive data access threats based on
>> >> preset policies and machine learning models for user behaviour analysis.
>> >>
>> >> Known Risks
>> >>
>> >> Orphaned Products
>> >> The core developers of Eagle team work full time on this project. There
>> >>is
>> >> no risk of Eagle getting orphaned since eBay is extensively using it in
>> >> their production Hadoop clusters and have plans to go beyond hadoop. For
>> >> example, currently there are 7 hadoop clusters and 2 of them are being
>> >> monitored using Hadoop Eagle in production. We have plans to extend it
>> >>to
>> >> all hadoop clusters and eventually other data platforms. There are 10¹s
>> >>of
>> >> policies onboarded and actively monitored with plans to onboard more use
>> >> case. We are very confident that every hadoop cluster in the world will
>> >>be
>> >> monitored using Eagle for securing the hadoop ecosystem by actively
>> >> monitoring for data access on sensitive data. We plan to extend and
>> >> diversify this community further through Apache. We presented Eagle at
>> >>the
>> >> hadoop summit in china and garnered interest from different companies
>> >>who
>> >> use hadoop extensively.
>> >>
>> >> Inexperience with Open Source
>> >> The core developers are all active users and followers of open source.
>> >> They are already committers and contributors to the Eagle Github
>> >>project.
>> >> All have been involved with the source code that has been released
>> >>under an
>> >> open source license, and several of them also have experience developing
>> >> code in an open source environment. Though the core set of Developers do
>> >> not have Apache Open Source experience, there are plans to onboard
>> >> individuals with Apache open source experience on to the project. Apache
>> >> Kylin PMC members are also in the same ebay organization. We work very
>> >> closely with Apache Ranger committers and are looking forward to find
>> >> meaningful integrations to improve the security of hadoop platform.
>> >>
>> >> Homogenous Developers
>> >> The core developers are from eBay. Today the problem of monitoring data
>> >> activities to find and stop threats is a universal problem faced by all
>> >>the
>> >> businesses. Apache Incubation process encourages an open and diverse
>> >> meritocratic community. Eagle intends to make every possible effort to
>> >> build a diverse, vibrant and involved community and has already received
>> >> substantial interest from various organizations.
>> >>
>> >> Reliance on Salaried Developers
>> >> eBay invested in Eagle as the monitoring solution for Hadoop clusters
>> >>and
>> >> some of its key engineers are working full time on the project. In
>> >> addition, since there is a growing need for securing sensitive data
>> >>access
>> >> we need a data activity monitoring solution for Hadoop, we look forward
>> >>to
>> >> other Apache developers and researchers to contribute to the project.
>> >> Additional contributors, including Apache committers have plans to join
>> >> this effort shortly. Also key to addressing the risk associated with
>> >> relying on Salaried developers from a single entity is to increase the
>> >> diversity of the contributors and actively lobby for Domain experts in
>> >>the
>> >> security space to contribute. Eagle intends to do this.
>> >>
>> >> Relationships with Other Apache Products
>> >> Eagle has a strong relationship and dependency with Apache Hadoop,
>> >>HBase,
>> >> Spark, Kafka and Storm. Being part of Apache¹s Incubation community,
>> >>could
>> >> help with a closer collaboration among these projects and as well as
>> >> others. An Excessive Fascination with the Apache Brand Eagle is
>> >>proposing
>> >> to enter incubation at Apache in order to help efforts to diversify the
>> >> committer-base, not so much to capitalize on the Apache brand. The Eagle
>> >> project is in production use already inside eBay, but is not expected
>> >>to be
>> >> an eBay product for external customers. As such, the Eagle project is
>> >>not
>> >> seeking to use the Apache brand as a marketing tool.
>> >>
>> >> Documentation
>> >> Information about Eagle can be found at https://github.com/eBay/Eagle.
>> >> The following link provide more information about Eagle
>> >>http://goeagle.io.
>> >>
>> >> Initial Source
>> >> Eagle has been under development since 2014 by a team of engineers at
>> >>eBay
>> >> Inc. It is currently hosted on Github.com under an Apache license 2.0 at
>> >> https://github.com/eBay/Eagle. Once in incubation we will be moving the
>> >> code base to apache git library.
>> >>
>> >> External Dependencies
>> >> Eagle has the following external dependencies.
>> >> Basic
>> >> €JDK 1.7+
>> >> €Scala 2.10.4
>> >> €Apache Maven
>> >> €JUnit
>> >> €Log4j
>> >> €Slf4j
>> >> €Apache Commons
>> >> €Apache Commons Math3
>> >> €Jackson
>> >> €Siddhi CEP engine
>> >>
>> >> Hadoop
>> >> €Apache Hadoop
>> >> €Apache HBase
>> >> €Apache Hive
>> >> €Apache Zookeeper
>> >> €Apache Curator
>> >>
>> >> Apache Spark
>> >> €Spark Core Library
>> >>
>> >> REST Service
>> >> €Jersey
>> >>
>> >> Query
>> >> €Antlr
>> >>
>> >> Stream processing
>> >> €Apache Storm
>> >> €Apache Kafka
>> >>
>> >> Web
>> >> €AngularJS
>> >> €jQuery
>> >> €Bootstrap V3
>> >> €Moment JS
>> >> €Admin LTE
>> >> €html5shiv
>> >> €respond
>> >> €Fastclick
>> >> €Date Range Picker
>> >> €Flot JS
>> >>
>> >> Cryptography
>> >> Eagle will eventually support encryption on the wire. This is not one of
>> >> the initial goals, and we do not expect Eagle to be a controlled export
>> >> item due to the use of encryption. Eagle supports but does not require
>> >>the
>> >> Kerberos authentication mechanism to access secured Hadoop services.
>> >>
>> >> Required Resources
>> >>
>> >> Mailing List
>> >> €eagle-private for private PMC discussions
>> >> €eagle-dev for developers
>> >> €eagle-commits for all commits
>> >> €eagle-users for all eagle users
>> >>
>> >> Subversion Directory
>> >> €Git is the preferred source control system.
>> >>
>> >> Issue Tracking
>> >> €JIRA Eagle (Eagle)
>> >>
>> >> Other Resources
>> >> The existing code already has unit tests so we will make use of existing
>> >> Apache continuous testing infrastructure. The resulting load should not
>> >>be
>> >> very large.
>> >>
>> >> Initial Committers
>> >> €Seshu Adunuthula <sadunuthula at ebay dot com>
>> >> €Arun Manoharan <armanoharan at ebay dot com>
>> >> €Edward Zhang <yonzhang at ebay dot com>
>> >> €Hao Chen <hchen9 at ebay dot com>
>> >> €Chaitali Gupta <cgupta at ebay dot com>
>> >> €Libin Sun <libsun at ebay dot com>
>> >> €Jilin Jiang <jiljiang at ebay dot com>
>> >> €Qingwen Zhao <qingwzhao at ebay dot com>
>> >> €Hemanth Dendukuri <hdendukuri at ebay dot com>
>> >> €Senthil Kumar <senthilkumar at ebay dot com>
>> >> €Tan Chen <tanchen at ebay dot com>
>> >>
>> >> Affiliations
>> >> The initial committers are employees of eBay Inc.
>> >>
>> >> Sponsors
>> >>
>> >> Champion
>> >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
>> >>
>> >> Nominated Mentors
>> >> €Owen O¹Malley < omalley at apache dot org > - Apache IPMC member,
>> >> Hortonworks
>> >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
>> >> €Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member,
>> >> Hortonworks
>> >>
>> >> Sponsoring Entity
>> >> We are requesting the Incubator to sponsor this project.
>> >>
>> >>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>>
>
>
>-- 
>Best Regards,
>-- Alex


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to