incubator-griffin git commit: update docs

guoyp Tue, 16 May 2017 17:47:04 -0700

Repository: incubator-griffin
Updated Branches:
  refs/heads/master 2f1fa71df -> 1c44c3230



update docs

Author: William Guo <[email protected]>

Closes #25 from guoyuepeng/20170517.


Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin/commit/1c44c323
Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin/tree/1c44c323
Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin/diff/1c44c323

Branch: refs/heads/master
Commit: 1c44c32300030e0772ff95fd8d22c53be181d9d1
Parents: 2f1fa71
Author: William Guo <[email protected]>
Authored: Wed May 17 08:46:06 2017 +0800
Committer: William Guo <[email protected]>
Committed: Wed May 17 08:46:06 2017 +0800

----------------------------------------------------------------------
 CONTRIBUTING.md               |   4 +--
 README.md                     |  57 +++++++++++++++++++++++++++----------
 griffin-doc/img/arch.png      | Bin 0 -> 307285 bytes
 griffin-doc/img/techstack.png | Bin 0 -> 127993 bytes
 griffin-doc/intro.md          |  15 +++++++---
 5 files changed, 55 insertions(+), 21 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/1c44c323/CONTRIBUTING.md
----------------------------------------------------------------------
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index b0a061f..9701c50 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -8,10 +8,10 @@ The following guidelines apply to all contributors.
 
 ### Making Changes
 
-* Fork the `eBay/griffin` repository
+* Fork the `apache/incubator-griffin` repository
 * Make your changes and push them to a topic branch in your fork
   * See our commit message guidelines further down in this document
-* Submit a pull request to the `eBay/griffin` repository
+* Submit a pull request to the `apache/incubator-griffin` repository
 
 ### General Guidelines
 

http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/1c44c323/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 9ec7548..0dd65bf 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 ## Apache Griffin
 
-Apache Griffin is a model driven Data Quality solution for distributed data 
systems at any scale in both streaming and batch data context. It provides a 
framework process for defining data quality model, executing data quality 
measurement, automating data profiling and validation, as well as an unified 
data quality visualization across multiple data systems. You can access our 
home page [here](https://ebay.github.io/griffin/).
+Apache Griffin is a model driven Data Quality solution for distributed data 
systems at any scale in both streaming and batch data context. It provides a 
framework process for defining data quality model, executing data quality 
measurement, automating data profiling and validation, as well as an unified 
data quality visualization across multiple data systems. You can access our 
home page [here](http://griffin.incubator.apache.org/).
 
 
 ### Contact us
@@ -8,12 +8,12 @@ Apache Griffin is a model driven Data Quality solution for 
distributed data syst
 
 
 ### CI
-https://travis-ci.org/eBay/griffin
+
 
 ### Repository
-Snapshot: https://oss.sonatype.org/content/repositories/snapshots
+Snapshot: 
 
-Release: https://oss.sonatype.org/service/local/staging/deploy/maven2
+Release: 
 
 ### How to build
 1. git clone the repository of https://github.com/apache/incubator-griffin
@@ -162,26 +162,53 @@ Release: 
https://oss.sonatype.org/service/local/staging/deploy/maven2
 13. You can also review the RESTful APIs through 
http://localhost:8080/api/v1/application.wadl
 
 ### How to develop
-In dev environment, you can run backend REST service and frontend UI 
seperately. The majority of the backend code logics are in the 
[griffin-core](https://github.com/apache/incubator-griffin/tree/master/griffin-core)
 project. So, to start backend, please import maven project Griffin into 
eclipse, right click ***griffin-core->Run As->Run On Server***
+In dev environment, you can run backend REST service and frontend UI 
seperately. The majority of the backend code logics are in the 
[service](https://github.com/apache/incubator-griffin/tree/master/service) 
project. So, to start backend, please import maven project Griffin into 
eclipse, ***GriffinWebApplication as Spring Boot App***
 
 To start frontend, please follow up the below steps.
 
-1. Open **griffin-ui/js/services/services.js** file
+1. Open **ui/js/services/services.js** file
 
 2. Specify **BACKEND_SERVER** to your real backend server address, below is an 
example
 
     ```
     var BACKEND_SERVER = 'http://localhost:8080'; //dev env
-    //var BACKEND_SERVER = 'http://localhost:8080/ROOT'; //dev env
     ```
-
-3. Open a command line, run the below commands in root directory of 
**griffin-ui**
-
-   - npm install
-   - bower install
-   - npm start
-
-4. Then the UI will be opened in browser automatically, please follow the 
[User 
Guide](https://github.com/eBay/griffin/tree/master/griffin-doc/userguide.md), 
enjoy your journey!
+3. Specify some variables like mysql, hive and kafka connectors in your 
properies file under service/src/main/resources/application.properties
+
+    ```
+    spring.datasource.url= jdbc:mysql://localhost:3306/metastore
+    spring.datasource.username =griffin
+    spring.datasource.password =123456
+    
+    spring.datasource.driver-class-name=com.mysql.jdbc.Driver
+    
+    ## Hibernate ddl auto (validate,create, create-drop, update)
+    
+    spring.jpa.hibernate.ddl-auto = create-drop
+    spring.jpa.show-sql=true
+    spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQL5Dialect
+    #
+    #
+    ## Naming strategy
+    spring.jpa.hibernate.naming-strategy = 
org.hibernate.cfg.ImprovedNamingStrategy
+    
+    # hive metastore 
+    hive.metastore.uris = thrift://localhost:9083
+    hive.metastore.dbname = default
+    
+    # kafka schema registry
+    kafka.schema.registry.url = http://localhost:8081
+    ```
+
+4. Open a command line, run the below commands in root directory
+
+   - mvn clean install
+
+5. Find the GriffinWebApplication,
+
+   - run as spring boot application
+   
+6. In your browser, open http://localhost:8080 ,enjoy your journey!
 
 **Note**: The front-end UI is still under development, you can only access 
some basic features currently.
 

http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/1c44c323/griffin-doc/img/arch.png
----------------------------------------------------------------------
diff --git a/griffin-doc/img/arch.png b/griffin-doc/img/arch.png
new file mode 100644
index 0000000..93bc755
Binary files /dev/null and b/griffin-doc/img/arch.png differ

http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/1c44c323/griffin-doc/img/techstack.png
----------------------------------------------------------------------
diff --git a/griffin-doc/img/techstack.png b/griffin-doc/img/techstack.png
new file mode 100644
index 0000000..ebc5540
Binary files /dev/null and b/griffin-doc/img/techstack.png differ

http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/1c44c323/griffin-doc/intro.md
----------------------------------------------------------------------
diff --git a/griffin-doc/intro.md b/griffin-doc/intro.md
index d4fd845..fefa27b 100644
--- a/griffin-doc/intro.md
+++ b/griffin-doc/intro.md
@@ -4,7 +4,7 @@ Apache Griffin is a Data Quality Service platform built on 
Apache Hadoop and Apa
 
 
 ## Overview of Apache Griffin  
-At eBay, when people use big data (Hadoop or other streaming systems), 
measurement of data quality is a big challenge. Different teams have built 
customized tools to detect and analyze data quality issues within their own 
domains. As a platform organization, we think of taking a platform approach to 
commonly occurring patterns. As such, we are building a platform to provide 
shared Infrastructure and generic features to solve common data quality pain 
points. This would enable us to build trusted data assets.
+When people use big data (Hadoop or other streaming systems), measurement of 
data quality is a big challenge. Different teams have built customized tools to 
detect and analyze data quality issues within their own domains. As a platform 
organization, we think of taking a platform approach to commonly occurring 
patterns. As such, we are building a platform to provide shared Infrastructure 
and generic features to solve common data quality pain points. This would 
enable us to build trusted data assets.
 
 Currently it is very difficult and costly to do data quality validation when 
we have large volumes of related data flowing across multi-platforms (streaming 
and batch). Take eBay's Real-time Personalization Platform as a sample; 
Everyday we have to validate the data quality for ~600M records. Data quality 
often becomes one big challenge in this complex environment and massive scale.
 
@@ -43,15 +43,22 @@ For near real time analysis, we consume data from messaging 
system, then our dat
 
 **Apache Griffin Service**:
 
-We have RESTful web services to accomplish all the functionalities of Apache 
Griffin, such as register data-set, create data quality model, publish metrics, 
retrieve metrics, add subscription, etc. So, the developers can develop their 
own user interface based on these web serivces.
+We have RESTful web services to accomplish all the functionalities of Apache 
Griffin, such as exploring data-sets, create data quality measures, publish 
metrics, retrieve metrics, add subscription, etc. So, the developers can 
develop their own user interface based on these web serivces.
 
 ## Main business process
-Here's the business process diagram
 
 ![Business_Process_image](img/Business_Process.png)
 
+## Main architecture diagram
+
+![Business_Process_image](img/arch.png)
+
+## Main tech stack diagram
+
+![Business_Process_image](img/techstack.png)
+
 ## Rationale
-The challenge we face at eBay is that our data volume is becoming bigger and 
bigger, systems process become more complex, while we do not have a unified 
data quality solution to ensure the trusted data sets which provide confidences 
on data quality to our data consumers.  The key challenges on data quality 
includes:
+The challenge we face at big data ecosystem is that our data volume is 
becoming bigger and bigger, systems process become more complex, while we do 
not have a unified data quality solution to ensure the trusted data sets which 
provide confidences on data quality to our data consumers.  The key challenges 
on data quality includes:
 
 1. Existing commercial data quality solution cannot address data quality 
lineage among systems, cannot scale out to support fast growing data at eBay
 2. Existing eBay's domain specific tools take a long time to identify and fix 
poor data quality when data flowed through multiple systems

incubator-griffin git commit: update docs

Reply via email to