Repository: incubator-griffin-site Updated Branches: refs/heads/asf-site b440ad97f -> 75036365e
Site updated: 2017-05-17 09:11:02 Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/commit/75036365 Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/tree/75036365 Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/diff/75036365 Branch: refs/heads/asf-site Commit: 75036365eaf1039792473408574d18cefa477c65 Parents: b440ad9 Author: guoyp <[email protected]> Authored: Wed May 17 09:11:02 2017 +0800 Committer: guoyp <[email protected]> Committed: Wed May 17 09:11:02 2017 +0800 ---------------------------------------------------------------------- 2017/03/03/plan/index.html | 2 +- 2017/03/04/community/index.html | 2 +- 2017/03/30/home/index.html | 13 ++++++++----- images/arch.png | Bin 0 -> 307285 bytes images/techstack.png | Bin 0 -> 127993 bytes index.html | 13 +++++++------ 6 files changed, 17 insertions(+), 13 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/75036365/2017/03/03/plan/index.html ---------------------------------------------------------------------- diff --git a/2017/03/03/plan/index.html b/2017/03/03/plan/index.html index b8d7cc1..82b1d8c 100644 --- a/2017/03/03/plan/index.html +++ b/2017/03/03/plan/index.html @@ -237,7 +237,7 @@ profiling target data asset, providing statistics by differen"> </div> <footer class="article-footer"> - <a data-url="http://yoursite.com/2017/03/03/plan/" data-id="cj1x9wwuy0002y0pot1in4xz2" class="article-share-link">Partager</a> + <a data-url="http://yoursite.com/2017/03/03/plan/" data-id="cj2sajr130002i2po3fai9oe8" class="article-share-link">Partager</a> </footer> http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/75036365/2017/03/04/community/index.html ---------------------------------------------------------------------- diff --git a/2017/03/04/community/index.html b/2017/03/04/community/index.html index c01f5c8..cbb5a05 100644 --- a/2017/03/04/community/index.html +++ b/2017/03/04/community/index.html @@ -123,7 +123,7 @@ Wikihttps://cwiki.apache.org/confluence/display/GRIFFIN/G"> </div> <footer class="article-footer"> - <a data-url="http://yoursite.com/2017/03/04/community/" data-id="cj1x9wwus0000y0pois748bng" class="article-share-link">Partager</a> + <a data-url="http://yoursite.com/2017/03/04/community/" data-id="cj2sajr0y0000i2powdn8gg1f" class="article-share-link">Partager</a> </footer> http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/75036365/2017/03/30/home/index.html ---------------------------------------------------------------------- diff --git a/2017/03/30/home/index.html b/2017/03/30/home/index.html index 998dd32..4326eb2 100644 --- a/2017/03/30/home/index.html +++ b/2017/03/30/home/index.html @@ -12,7 +12,9 @@ <meta property="og:site_name" content="Apache Griffin"> <meta property="og:description" content="AbstractApache Griffin is a Data Quality Service platform built on Apache Hadoop and Apache Spark. It provides a framework process for defining data quality model, executing data quality measurement,"> <meta property="og:image" content="http://yoursite.com/images/Business_Process.png"> -<meta property="og:updated_time" content="2017-04-21T03:08:14.000Z"> +<meta property="og:image" content="http://yoursite.com/images/arch.png"> +<meta property="og:image" content="http://yoursite.com/images/techstack.png"> +<meta property="og:updated_time" content="2017-05-17T01:03:37.000Z"> <meta name="twitter:card" content="summary"> <meta name="twitter:title" content="Apache Griffin"> <meta name="twitter:description" content="AbstractApache Griffin is a Data Quality Service platform built on Apache Hadoop and Apache Spark. It provides a framework process for defining data quality model, executing data quality measurement,"> @@ -92,7 +94,7 @@ <div class="article-entry" itemprop="articleBody"> <h2 id="Abstract"><a href="#Abstract" class="headerlink" title="Abstract"></a>Abstract</h2><p>Apache Griffin is a Data Quality Service platform built on Apache Hadoop and Apache Spark. It provides a framework process for defining data quality model, executing data quality measurement, automating data profiling and validation, as well as a unified data quality visualization across multiple data systems. It tries to address the data quality challenges in big data and streaming context.</p> -<h2 id="Overview-of-Apache-Griffin"><a href="#Overview-of-Apache-Griffin" class="headerlink" title="Overview of Apache Griffin"></a>Overview of Apache Griffin</h2><p>At eBay, when people use big data (Hadoop or other streaming systems), measurement of data quality is a big challenge. Different teams have built customized tools to detect and analyze data quality issues within their own domains. As a platform organization, we think of taking a platform approach to commonly occurring patterns. As such, we are building a platform to provide shared Infrastructure and generic features to solve common data quality pain points. This would enable us to build trusted data assets.</p> +<h2 id="Overview-of-Apache-Griffin"><a href="#Overview-of-Apache-Griffin" class="headerlink" title="Overview of Apache Griffin"></a>Overview of Apache Griffin</h2><p>When people use big data (Hadoop or other streaming systems), measurement of data quality is a big challenge. Different teams have built customized tools to detect and analyze data quality issues within their own domains. As a platform organization, we think of taking a platform approach to commonly occurring patterns. As such, we are building a platform to provide shared Infrastructure and generic features to solve common data quality pain points. This would enable us to build trusted data assets.</p> <p>Currently it is very difficult and costly to do data quality validation when we have large volumes of related data flowing across multi-platforms (streaming and batch). Take eBayâs Real-time Personalization Platform as a sample; Everyday we have to validate the data quality for ~600M records. Data quality often becomes one big challenge in this complex environment and massive scale.</p> <p>We detect the following at eBay:</p> <ol> @@ -120,8 +122,9 @@ <p>For near real time analysis, we consume data from messaging system, then our data quality model will compute our real time data quality metrics in our spark cluster. for data storage, we use time series database in our back end to fulfill front end request.</p> <p><strong>Apache Griffin Service</strong>:</p> <p>We have RESTful web services to accomplish all the functionalities of Apache Griffin, such as register data-set, create data quality model, publish metrics, retrieve metrics, add subscription, etc. So, the developers can develop their own user interface based on these web serivces.</p> -<h2 id="Main-business-process"><a href="#Main-business-process" class="headerlink" title="Main business process"></a>Main business process</h2><p>Hereâs the business process diagram</p> -<p><img src="/images/Business_Process.png" alt=""></p> +<h2 id="Main-business-process"><a href="#Main-business-process" class="headerlink" title="Main business process"></a>Main business process</h2><p><img src="/images/Business_Process.png" alt=""></p> +<h2 id="Architecture-diagram"><a href="#Architecture-diagram" class="headerlink" title="Architecture diagram"></a>Architecture diagram</h2><p><img src="/images/arch.png" alt=""></p> +<h2 id="Tech-stack"><a href="#Tech-stack" class="headerlink" title="Tech stack"></a>Tech stack</h2><p><img src="/images/techstack.png" alt=""></p> <h2 id="Rationale"><a href="#Rationale" class="headerlink" title="Rationale"></a>Rationale</h2><p>The challenge we face at eBay is that our data volume is becoming bigger and bigger, systems process become more complex, while we do not have a unified data quality solution to ensure the trusted data sets which provide confidences on data quality to our data consumers. The key challenges on data quality includes:</p> <ol> <li>Existing commercial data quality solution cannot address data quality lineage among systems, cannot scale out to support fast growing data at eBay</li> @@ -143,7 +146,7 @@ </div> <footer class="article-footer"> - <a data-url="http://yoursite.com/2017/03/30/home/" data-id="cj1x9wwuw0001y0pop82l43c9" class="article-share-link">Partager</a> + <a data-url="http://yoursite.com/2017/03/30/home/" data-id="cj2sajr110001i2poo3d8pmqd" class="article-share-link">Partager</a> </footer> http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/75036365/images/arch.png ---------------------------------------------------------------------- diff --git a/images/arch.png b/images/arch.png new file mode 100644 index 0000000..93bc755 Binary files /dev/null and b/images/arch.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/75036365/images/techstack.png ---------------------------------------------------------------------- diff --git a/images/techstack.png b/images/techstack.png new file mode 100644 index 0000000..ebc5540 Binary files /dev/null and b/images/techstack.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/75036365/index.html ---------------------------------------------------------------------- diff --git a/index.html b/index.html index 99e37e2..958796d 100644 --- a/index.html +++ b/index.html @@ -88,7 +88,7 @@ <div class="article-entry" itemprop="articleBody"> <h2 id="Abstract"><a href="#Abstract" class="headerlink" title="Abstract"></a>Abstract</h2><p>Apache Griffin is a Data Quality Service platform built on Apache Hadoop and Apache Spark. It provides a framework process for defining data quality model, executing data quality measurement, automating data profiling and validation, as well as a unified data quality visualization across multiple data systems. It tries to address the data quality challenges in big data and streaming context.</p> -<h2 id="Overview-of-Apache-Griffin"><a href="#Overview-of-Apache-Griffin" class="headerlink" title="Overview of Apache Griffin"></a>Overview of Apache Griffin</h2><p>At eBay, when people use big data (Hadoop or other streaming systems), measurement of data quality is a big challenge. Different teams have built customized tools to detect and analyze data quality issues within their own domains. As a platform organization, we think of taking a platform approach to commonly occurring patterns. As such, we are building a platform to provide shared Infrastructure and generic features to solve common data quality pain points. This would enable us to build trusted data assets.</p> +<h2 id="Overview-of-Apache-Griffin"><a href="#Overview-of-Apache-Griffin" class="headerlink" title="Overview of Apache Griffin"></a>Overview of Apache Griffin</h2><p>When people use big data (Hadoop or other streaming systems), measurement of data quality is a big challenge. Different teams have built customized tools to detect and analyze data quality issues within their own domains. As a platform organization, we think of taking a platform approach to commonly occurring patterns. As such, we are building a platform to provide shared Infrastructure and generic features to solve common data quality pain points. This would enable us to build trusted data assets.</p> <p>Currently it is very difficult and costly to do data quality validation when we have large volumes of related data flowing across multi-platforms (streaming and batch). Take eBayâs Real-time Personalization Platform as a sample; Everyday we have to validate the data quality for ~600M records. Data quality often becomes one big challenge in this complex environment and massive scale.</p> <p>We detect the following at eBay:</p> <ol> @@ -116,8 +116,9 @@ <p>For near real time analysis, we consume data from messaging system, then our data quality model will compute our real time data quality metrics in our spark cluster. for data storage, we use time series database in our back end to fulfill front end request.</p> <p><strong>Apache Griffin Service</strong>:</p> <p>We have RESTful web services to accomplish all the functionalities of Apache Griffin, such as register data-set, create data quality model, publish metrics, retrieve metrics, add subscription, etc. So, the developers can develop their own user interface based on these web serivces.</p> -<h2 id="Main-business-process"><a href="#Main-business-process" class="headerlink" title="Main business process"></a>Main business process</h2><p>Hereâs the business process diagram</p> -<p><img src="/images/Business_Process.png" alt=""></p> +<h2 id="Main-business-process"><a href="#Main-business-process" class="headerlink" title="Main business process"></a>Main business process</h2><p><img src="/images/Business_Process.png" alt=""></p> +<h2 id="Architecture-diagram"><a href="#Architecture-diagram" class="headerlink" title="Architecture diagram"></a>Architecture diagram</h2><p><img src="/images/arch.png" alt=""></p> +<h2 id="Tech-stack"><a href="#Tech-stack" class="headerlink" title="Tech stack"></a>Tech stack</h2><p><img src="/images/techstack.png" alt=""></p> <h2 id="Rationale"><a href="#Rationale" class="headerlink" title="Rationale"></a>Rationale</h2><p>The challenge we face at eBay is that our data volume is becoming bigger and bigger, systems process become more complex, while we do not have a unified data quality solution to ensure the trusted data sets which provide confidences on data quality to our data consumers. The key challenges on data quality includes:</p> <ol> <li>Existing commercial data quality solution cannot address data quality lineage among systems, cannot scale out to support fast growing data at eBay</li> @@ -139,7 +140,7 @@ </div> <footer class="article-footer"> - <a data-url="http://yoursite.com/2017/03/30/home/" data-id="cj1x9wwuw0001y0pop82l43c9" class="article-share-link">Partager</a> + <a data-url="http://yoursite.com/2017/03/30/home/" data-id="cj2sajr110001i2poo3d8pmqd" class="article-share-link">Partager</a> </footer> @@ -193,7 +194,7 @@ </div> <footer class="article-footer"> - <a data-url="http://yoursite.com/2017/03/04/community/" data-id="cj1x9wwus0000y0pois748bng" class="article-share-link">Partager</a> + <a data-url="http://yoursite.com/2017/03/04/community/" data-id="cj2sajr0y0000i2powdn8gg1f" class="article-share-link">Partager</a> </footer> @@ -322,7 +323,7 @@ </div> <footer class="article-footer"> - <a data-url="http://yoursite.com/2017/03/03/plan/" data-id="cj1x9wwuy0002y0pot1in4xz2" class="article-share-link">Partager</a> + <a data-url="http://yoursite.com/2017/03/03/plan/" data-id="cj2sajr130002i2po3fai9oe8" class="article-share-link">Partager</a> </footer>
