http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/Testcase.md ---------------------------------------------------------------------- diff --git a/griffin-doc/Testcase.md b/griffin-doc/Testcase.md new file mode 100644 index 0000000..4e9c943 --- /dev/null +++ b/griffin-doc/Testcase.md @@ -0,0 +1,60 @@ +#Functional Test cases + + +|TestCase ID|Test Page|Test Case Description|Test Case Steps|Test Data|Expected Result|Actual Result|Test Result|Jira Bug ID| +|---|---|---|---|---|---|---|---|---| +|0101|login page|invalid corp id - check user cannot login the system with invalid corp id.|1. Input invalid corp id.<br>2. Input password.<br>3. click 'log in' button.||1. login failed.||Passed|| +|0102|login page|invalid password - check user cannot login the system with invalid password.|1. input valid corp id.<br>2.input invalid password<br>3.click 'log in' button.||1. login failed.||Passed|| +|0103|login page|valid corp id and passoword - check user can login the system with valid corp id and password.|1. Input the corp id and password.<br>2 click 'log in' button.||1. login succesfully||Passed|| +|0104|login page|remember password|1. Input the corp id and password.<br>2. select 'remember password'.<br>3.click 'log in' button.<br>4. close the brower.<br>5. open the brower again.<br>6. visit the griffin page.||1.the id and password are valid.<br>2.'remember password' is checked.<br>3.logged in the griffin homepage.<br>4.the brower is closed.<br>5.the brower is reopened.<br>6.the griffin homepage is opened, instead of the login page.||Passed|| +|0105|login page|not remember password|1. Input the corp id and password.<br>2. unselect 'remember password'.<br>3.click 'log in' button.<br>4. close the brower.<br>5. open the brower again.<br>6. visit the griffin page.||1.the id and password are valid.<br>2.'remember password' is unchecked.<br>3.logged in the griffin homepage.<br>4.the brower is closed.<br>5.the brower is reopened.<br>6.the login page is opened.||Passed|| +|0201|main page|menu bar - check all links in the menu work.|1. click 'health'.<br>2. click 'models'.<br>3.click 'Data profiling'.<br>4. click your username -> 'API Docs'.||1.show 'health' page.<br>2.show 'models' page.<br>3.show 'data profiling' page<br>4.open new page for API page.||Passed|| +|0202|main page|menu bar - search|1.input a word in the search box.<br>2.do search.||1. show search result.|unimplemented||| +|0203|main page|menu bar - user profile|1. click username -> 'user profile'||1. show user profile page|unimplemented||| +|0204|main page|menu bar - setting|1. click username -> 'setting'||1. show setting page.|unimplemented||| +|0205|main page|right side - DataAssets|1. click '*** DataAssets' link||1.show the data assets page.||Passed|| +|0206|main page|right side - DQ Metrics|1. click '*** DQ Metrics' link.||1. show DQ Metrics page||Passed|| +|0207|main page|right side - health percentage |1. check the pie for the health percentage.||1. show the health percentage.||Passed|| +|0208|main page|right side - issue tracking|1. click 'issue tracking'||1. show 'issue tracking' page|unimplemented||| +|0209|main page|right side - statistics for the DQ data.|1. check the DQ data with the name, last updated time, and the data quality.<br>2. show more for one item, check the dq trend chart. <br>3. click the chart.<br>4. close the zoomed-in chart.||1.show all the dq data.<br>2.show the latest dq trend chart for the item.<br>3.the dq chart is zoomed in.<br>4.the zoomed-in chart is closed.||Passed|| +|0210|main page|right side - report issue.|1. click 'Report issue'||1. open the jira page.||Passed|| +|0301|health page|heatmap|1. open 'heatmap' tab.<br>2. check the data quality metrics heatmap.<br>3. click inside the heatmap.||1.show the heatmap.<br>2.all the data are shown successfully.<br>3.show the metrics page.||Passed|| +|0302|health page|Topology|1. open 'Topology' tab.<br>2. check the data.||1. show topology.|unimplemented||| +|0303|health page|check the UI layout when the page is zoomed in and out.|1.zoom in the page.<br>2.zoom out the page.||1. display the page correctly.||Passed|| +|0401|metrics page|check metrics data|1. check the dq charts for the metrics.<br>2. click one chart.||1. all the data in the dq charts are correct.<br>2. the chart is zoomed in.||Passed|| +|0402|metrics page|Download Sample|1. click 'download sample'.||1. the sample is downloaded to the local path.|unimplemented||| +|0403|metrics page|Profiling|1. click 'profiling'||1. show 'profiling'|unimplemented||| +|0404|metrics page|check the UI layout when the page is zoomed in and out.|1.zoom in the page.<br>2.zoom out the page.||1. display the page correctly.||Passed|| +|0501|models page|check the models data|1. check all the columns are correct or not.<br>2. click one model name.||1. all the data are correct.<br>2. show more information of the model.||Passed|| +|0502|models page|edit model|1. click 'edit' icon.||1. open the edit page.|unimplemented||| +|0503|models page|delete model|1. click 'delete' icon for one model.<br>2. confirm to delete the model.||1. open delete confirmation page.<br>2. the model is removed from the models table.||Passed|| +|0504|models page|subscribe|1. click 'subscribe' icon for one model.||1. open subscribe page|unimplemented||| +|0505|models page|table paging|1. click other pages in the models table.||1.all the data in other pages are show correctly.||Passed|| +|0506|models page|create DQ model|1. click 'create DQ model' button||1. open 'create DQ model' page.||Passed|| +|0507|models page|check the UI layout when the page is zoomed in and out.|1.zoom in the page.<br>2.zoom out the page.||1. display the page correctly.||Passed|| +|0601|create dq model - accuracy|create accuracy|1. click 'models' -> 'create DQ model' -> 'Accuracy'<br>2.choose the source. Select a schema, e.g. 'apollo -> Sojorner -> sog_search_event'.<br>3.select one or more attributes, e.g. uid, site_id.<br>4. click 'next'.<br>5. choose the target. Select a schema, e.g. 'appollo -> Bullseye -> adchoice_user_pref'.<br>6. select one or more attributes, e.g. user_id, scope.<br>7. click 'next'.<br>8. select a primary key, e.g. Bullseye.achoice_user_pref.user_id.<br>9. select 'Map To' exactly.<br>10. select a source field for each target.<br>11. click 'next'.<br>12. input the required information, e.g. model name 'atest', notification email 'a...@ebay.com'.<br>13.click 'submit'.<br>14. confirm to save.|source schema: 'apollo -> Sojorner -> sog_search_event'.<br>Source attributes: uid, site_id.<br>target schema: 'appollo -> Bullseye -> adchoice_user_pref'.<br>target attributes, e.g. user_id, scope.<br>primary key: Bullseye.achoice_user_pref.user_id.< br>model name: 'atest', <br>notification email: 'a...@ebay.com'.|1. open 'create accuracy' page.<br>2. the source shcema is selected. The corresponding attributes are shown in the attributes table.<br>3. the source attributes are selected.<br>4. go to 'choose target' step.<br>5. the target schema is selected. The corresponding attributes are shown in the attributes table.<br>6. the target attributes are selected.<br>7. go to 'mapping source and target' step.<br>8. the PK is selected.<br>9. exactly map to the source.<br>10. the source field is selected for each target.<br>11. go to 'configuration' step.<br>12. the required info are input correctly.<br>13. open a confirmation page.<br>14. the new model 'atest' is created. It is shown in the models table||Passed|| +|0602|create dq model - accuracy|show error message if no source attribute is selected.|1. click 'models' -> 'create DQ model' -> 'Accuracy'.<br>2. click 'next'||1. open 'create accuracy' page.<br>2. show error message to select at least one attribute.||Passed|| +|0603|create dq model - accuracy|show error message if no target attribute is selected.|1. click 'models' -> 'create DQ model' -> 'Accuracy'<br>2.choose the source. Select a schema, e.g. 'apollo -> Sojorner -> sog_search_event'.<br>3.select one or more attributes, e.g. uid, site_id.<br>4. click 'next'.<br>5. in the 'target' step, click 'next'.|source schema: 'apollo -> Sojorner -> sog_search_event'.<br>Source attributes: uid, site_id.|"1. open 'create accuracy' page.<br>2. the source shcema is selected. The corresponding attributes are shown in the attributes table.<br>3. the source attributes are selected.<br>4. go to 'choose target' step.<br>5. show error message to select at least one attribute.||Passed|| +|0604|create dq model - accuracy|show error message if 'map fields' is not set.|1. click 'models' -> 'create DQ model' -> 'Accuracy'<br>2.choose the source. Select a schema, e.g. 'apollo -> Sojorner -> sog_search_event'.<br>3.select one or more attributes, e.g. uid, site_id.<br>4. click 'next'.<br>5. choose the target. Select a schema, e.g. 'appollo -> Bullseye -> adchoice_user_pref'.<br>6. select one or more attributes, e.g. user_id, scope.<br>7. click 'next'.<br>8. no selection. click 'next'.<br>9. select a primary key. click 'next'.|source schema: 'apollo -> Sojorner -> sog_search_event'.<br>Source attributes: uid, site_id.<br>target schema: 'appollo -> Bullseye -> adchoice_user_pref'.<br>target attributes, e.g. user_id, scope.<br>primary key: Bullseye.achoice_user_pref.user_id.|1. open 'create accuracy' page.<br>2. the source shcema is selected. The corresponding attributes are shown in the attributes table.<br>3. the source attributes are selected.<br>4. go to 'choose target' s tep.<br>5. the target schema is selected. The corresponding attributes are shown in the attributes table.<br>6. the target attributes are selected.<br>7. go to 'mapping source and target' step.<br>8. no PK is selected.<br>9. show error message.||Passed|| +|0605|create dq model - accuracy|show error if the configuration is invalid|1. click 'models' -> 'create DQ model' -> 'Accuracy'<br>2.choose the source. Select a schema, e.g. 'apollo -> Sojorner -> sog_search_event'.<br>3.select one or more attributes, e.g. uid, site_id.<br>4. click 'next'.<br>5. choose the target. Select a schema, e.g. 'appollo -> Bullseye -> adchoice_user_pref'.<br>6. select one or more attributes, e.g. user_id, scope.<br>7. click 'next'.<br>8. select a primary key, e.g. Bullseye.achoice_user_pref.user_id.<br>9. select 'Map To' exactly.<br>10. select a source field for each target.<br>11. click 'next'.<br>12. input invalid value for each field, e.g. model name 'a test', notification email 'aa'.|source schema: 'apollo -> Sojorner -> sog_search_event'.<br>Source attributes: uid, site_id.<br>target schema: 'appollo -> Bullseye -> adchoice_user_pref'.<br>target attributes, e.g. user_id, scope.<br>primary key: Bullseye.achoice_user_pref.user_id.<br>model name: 'a test' , <br>notification email: 'aa'.|1. open 'create accuracy' page.<br>2. the source shcema is selected. The corresponding attributes are shown in the attributes table.<br>3. the source attributes are selected.<br>4. go to 'choose target' step.<br>5. the target schema is selected. The corresponding attributes are shown in the attributes table.<br>6. the target attributes are selected.<br>7. go to 'mapping source and target' step.<br>8. the PK is selected.<br>9. exactly map to the source.<br>10. the source field is selected for each target.<br>11. go to 'configuration' step.<br>12. show error for invalid value.||Passed|| +|0606|create dq model - accuracy|check the link to add new data asset.|1. click the link for adding new data asset.||1. go to the 'register data asset' page.||Passed|| +|0607|create dq model - accuracy|check the UI layout for all the steps when the page is zoomed in and out.|1.zoom in the page.<br>2.zoom out the page.||1.all the steps in the page can be shown correctly.||Passed|| +|0701|create dq model - validity|check dq model with validity type can be created.|1. click 'models' -> 'create DQ model' -> Validity<br>2.choose the target. Select a schema, e.g. 'apollo -> Sojorner -> sog_search_event'.<br>3.select one attribute, e.g. uid.<br>4. click 'next'.<br>5. choose one validity model, e.g. unique count.<br>6. click 'next'.<br>7. input valid value for the configuration fields. e.g. model name 'avalidity', email 'a...@b.com'.<br>8. click 'submit'<br>9. click 'save'.|schema: 'apollo -> Sojorner -> sog_search_event'.<br>Attribute: uid.<br>validity model: unique count.<br>model name: 'a validity', <br>email: 'aa'.|1. open 'create validity' page.<br>2. the target schem is selected. The corresponding attributes are shown in the attributes table.<br>3. the attribute is selected.<br>4. go to 'select model' page.<br>5. the validity model is selected. The description of the model is shown as well.<br>6. go to 'configuration' step.<br>7. all the value are valid.<br>8. op en a confirmation page.<br>9. the new model 'avalidity' is created successfully. it is shown in the models page.||Passed|| +|0702|create dq model - validity|show error if no target is selected.|1. click 'models' -> 'create DQ model' -> Validity<br>2. not choose the target.<br>3. click 'next'.||1. open 'create validity' page.<br>2. no target schem is selected.<br>3. show error.||Passed|| +|0703|create dq model - validity|show error if any field is invalid.|1. click 'models' -> 'create DQ model' -> Validity<br>2.choose the target. Select a schema, e.g. 'apollo -> Sojorner -> sog_search_event'.<br>3.select one attribute, e.g. uid.<br>4. click 'next'.<br>5. choose one validity model, e.g. unique count.<br>6. click 'next'.<br>7. input invalid value for the configuration fields.|schema: 'apollo -> Sojorner -> sog_search_event'.<br>validity model: unique count.<br>Attribute: uid.<br>model name: 'a validity', <br>email: 'aa'.|1. open 'create validity' page.<br>2. the target schem is selected. The corresponding attributes are shown in the attributes table.<br>3. the attribute is selected.<br>4. go to 'select model' page.<br>5. the validity model is selected. The description of the model is shown as well.<br>6. go to 'configuration' step.<br>7. show error for the invalid value.||Passed|| +|0704|create dq model - validity|check the UI layout for all the steps when the page is zoomed in and out.|1.zoom in the page.<br>2.zoom out the page.||1.all the steps in the page can be shown correctly.||Passed|| +|0801|create dq model - anomaly detection|check the dq model with anomaly detection can be created.|1. click 'models' -> 'create DQ model' -> Validity<br>2.choose the target. Select a schema, e.g. 'apollo -> Sojorner -> sog_search_event'.<br>3. click 'next'.<br>4. choose one statistical techniques, e.g. bollinger bands detection.<br>5. click 'next'.<br>6. input valid value for the configuration fields. e.g. model name 'anomaly', email 'a...@b.com'.<br>7. click 'submit'<br>8. click 'save'.|schema: 'apollo -> Sojorner -> sog_search_event'.<br>statistical techniques: bollinger bands detection.<br>model name 'anomaly', <br>email 'a...@b.com'.|1. open 'create validity' page.<br>2. the target schem is selected. The corresponding attributes are shown in the attributes table.<br>3. go to 'select model' page.<br>4. the validity model is selected. The description of the model is shown as well.<br>5. go to 'configuration' step.<br>6. all the value are valid.<br>7. open a confirmation page.<br>8. t wo new models, 'anomaly' with 'anomaly detection' type, and 'Count_anomaly_1' with 'validity' type are created successfully. They are shown in the models page.||Passed|| +|0802|create dq model - anomaly detection|show error if no target is selected.|1. click 'models' -> 'create DQ model' -> Validity<br>2. not choose the target.<br>3. click 'next'.||1. open 'create validity' page.<br>2. no target schem is selected.<br>3. show error.||Passed|| +|0803|create dq model - anomaly detection|show error if any field is invalid.|1. click 'models' -> 'create DQ model' -> Validity<br>2.choose the target. Select a schema, e.g. 'apollo -> Sojorner -> sog_search_event'.<br>3. click 'next'.<br>4. choose one statistical techniques, e.g. bollinger bands detection.<br>5. click 'next'.<br>6. input invalid value for the configuration fields.|schema: 'apollo.Sojorner. sog_search_event'<br>model name: 'a nomaly', <br>email: 'aa'.|1. open 'create validity' page.<br>. the target schem is selected. The corresponding attributes are shown in the attributes table.<br>3. go to 'select model' page.<br>4. the validity model is selected. The description of the model is shown as well.<br>5. go to 'configuration' step.<br>6. show error for the invalid value.||Passed|| +|0804|create dq model - anomaly detection|check the UI layout for all the steps when the page is zoomed in and out.|1.zoom in the page.<br>2.zoom out the page.||1.all the steps in the page can be shown correctly.||Passed|| +|0901|create dq model - publish DQ data directly|check the dq model with publish type can be created.|1. click 'models' -> 'create DQ model' -> publish DQ data directly.<br>2.input valid value for the configuration fields.<br>3. click 'submit'<br>4. click 'save'.|model name 'apu', <br>organization 'hadoop', <br>email 'a...@b.com'.|1. open 'create validity' page.<br>2. all the value are valid.<br>3. open a confirmation page.<br>4. the new model 'apu' is created successfully. It is shown in the models page.||Passed|| +|0902|create dq model - publish DQ data directly|show error if any field is invalid.|1. click 'models' -> 'create DQ model' -> publish DQ data directly.<br>2.input invalid value for the configuration fields. |model name 'a pu', email 'aa'.|1. open 'create validity' page.<br>2. show error for the invalid value.||Passed|| +|0903|create dq model - publish DQ data directly|check the UI layout for all the steps when the page is zoomed in and out.|1.zoom in the page.<br>2.zoom out the page.||1.all the steps in the page can be shown correctly.||Passed|| +|1001|data assets|check the data assets information|1. check all the columns are correct or not.<br>2. show more for an asset.||1. all the data are correct.<br>2. show the schemas of the asset.||Passed|| +|1002|data assets|edit asset|1. click 'edit' icon for an asset, e.g. 'abc'.<br>2. edit the schema description and sample.<br>3. click 'submit'.<br>4. confirm to save.<br>5. in the asset table, show more for the asset 'abc'.||1. open the edit page.<br>2. the schema description and sample are valid.<br>3. open a confirmation page.<br>4. the asset info are saved.<br>5. the schema info are updated.||Passed|| +|1003|data assets|delete asset|1. click 'delete' icon for an asset, e.g. 'abc'.<br>2. confirm to delete the asset.||1. open delete confirmation page.<br>2. the asset is removed from the table.||Passed|| +|1004|data assets|table paging|1. click other pages in the table.||1.all the data in other pages are show correctly.||Passed|| +|1005|data assets|check the UI layout when the page is zoomed in and out.|1.zoom in the page.<br>2.zoom out the page.||1. display the page correctly.||Passed|| +|1101|register data asset|check data asset can be registered.|1. click 'register data asset' in the 'data assets' page.<br>2. input valid value.<br>3. click 'submit'.<br>4. confirm to save.|asset name: 'atest', <br>type: 'hdfsfile',<br>HDFS path: '/var', <br>data folder pattern: '16-06-01',<br>platform: 'Apollo',<br>organization: 'GPS',<br>schema: name 'dmg', type 'string'|1. open 'register data asset' page.<br>2. all the value are valid.<br>3. open a confirmation page.<br>4. the new asset is registered successfully. It is shown in the assets table.||Passed|| +|1102|register data asset|show error if any field is invalid.|1. click 'register data asset' in the 'data assets' page.<br>2. input some invalid value.<br>3. click 'submit'.|asset name: 'a test', <br>type: 'hdfsfile',<br>HDFS path: '/var', <br>data folder pattern: '16-06-01',<br>platform: 'Apollo',<br>organization: null,<br>schema: name 'dmg', type 'string',|1. open 'register data asset' page.<br>2. some value are invalid.<br>3. show error for the invalid value.||Passed|| +|1103|register data asset|check the UI layout when the page is zoomed in and out.|1.zoom in the page.<br>2.zoom out the page.||1. display the page correctly.||Passed||
http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/SEQUENCES.bson ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/SEQUENCES.bson b/griffin-doc/db/unitdb0/SEQUENCES.bson new file mode 100644 index 0000000..5a5cc64 Binary files /dev/null and b/griffin-doc/db/unitdb0/SEQUENCES.bson differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/SEQUENCES.metadata.json ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/SEQUENCES.metadata.json b/griffin-doc/db/unitdb0/SEQUENCES.metadata.json new file mode 100644 index 0000000..0395aac --- /dev/null +++ b/griffin-doc/db/unitdb0/SEQUENCES.metadata.json @@ -0,0 +1 @@ +{"options":{},"indexes":[{"v":1,"key":{"_id":1},"name":"_id_","ns":"unitdb0.SEQUENCES"}]} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/data_assets.bson ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/data_assets.bson b/griffin-doc/db/unitdb0/data_assets.bson new file mode 100644 index 0000000..5e2c65f Binary files /dev/null and b/griffin-doc/db/unitdb0/data_assets.bson differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/data_assets.metadata.json ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/data_assets.metadata.json b/griffin-doc/db/unitdb0/data_assets.metadata.json new file mode 100644 index 0000000..9e102a1 --- /dev/null +++ b/griffin-doc/db/unitdb0/data_assets.metadata.json @@ -0,0 +1 @@ +{"options":{},"indexes":[{"v":1,"key":{"_id":1},"name":"_id_","ns":"unitdb0.data_assets"}]} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/dq_job.bson ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/dq_job.bson b/griffin-doc/db/unitdb0/dq_job.bson new file mode 100644 index 0000000..026dead Binary files /dev/null and b/griffin-doc/db/unitdb0/dq_job.bson differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/dq_job.metadata.json ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/dq_job.metadata.json b/griffin-doc/db/unitdb0/dq_job.metadata.json new file mode 100644 index 0000000..fa3a369 --- /dev/null +++ b/griffin-doc/db/unitdb0/dq_job.metadata.json @@ -0,0 +1 @@ +{"options":{},"indexes":[{"v":1,"key":{"_id":1},"name":"_id_","ns":"unitdb0.dq_job"}]} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/dq_metrics_values.bson ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/dq_metrics_values.bson b/griffin-doc/db/unitdb0/dq_metrics_values.bson new file mode 100644 index 0000000..857df0c Binary files /dev/null and b/griffin-doc/db/unitdb0/dq_metrics_values.bson differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/dq_metrics_values.metadata.json ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/dq_metrics_values.metadata.json b/griffin-doc/db/unitdb0/dq_metrics_values.metadata.json new file mode 100644 index 0000000..c259dec --- /dev/null +++ b/griffin-doc/db/unitdb0/dq_metrics_values.metadata.json @@ -0,0 +1 @@ +{"options":{},"indexes":[{"v":1,"key":{"_id":1},"name":"_id_","ns":"unitdb0.dq_metrics_values"}]} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/dq_missed_file_path_lkp.bson ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/dq_missed_file_path_lkp.bson b/griffin-doc/db/unitdb0/dq_missed_file_path_lkp.bson new file mode 100644 index 0000000..8294c87 Binary files /dev/null and b/griffin-doc/db/unitdb0/dq_missed_file_path_lkp.bson differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/dq_missed_file_path_lkp.metadata.json ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/dq_missed_file_path_lkp.metadata.json b/griffin-doc/db/unitdb0/dq_missed_file_path_lkp.metadata.json new file mode 100644 index 0000000..05a5987 --- /dev/null +++ b/griffin-doc/db/unitdb0/dq_missed_file_path_lkp.metadata.json @@ -0,0 +1 @@ +{"options":{},"indexes":[{"v":1,"key":{"_id":1},"name":"_id_","ns":"unitdb0.dq_missed_file_path_lkp"}]} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/dq_model.bson ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/dq_model.bson b/griffin-doc/db/unitdb0/dq_model.bson new file mode 100644 index 0000000..4311ef4 Binary files /dev/null and b/griffin-doc/db/unitdb0/dq_model.bson differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/dq_model.metadata.json ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/dq_model.metadata.json b/griffin-doc/db/unitdb0/dq_model.metadata.json new file mode 100644 index 0000000..2da029a --- /dev/null +++ b/griffin-doc/db/unitdb0/dq_model.metadata.json @@ -0,0 +1 @@ +{"options":{},"indexes":[{"v":1,"key":{"_id":1},"name":"_id_","ns":"unitdb0.dq_model"}]} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/dq_schedule.bson ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/dq_schedule.bson b/griffin-doc/db/unitdb0/dq_schedule.bson new file mode 100644 index 0000000..7670c27 Binary files /dev/null and b/griffin-doc/db/unitdb0/dq_schedule.bson differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/dq_schedule.metadata.json ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/dq_schedule.metadata.json b/griffin-doc/db/unitdb0/dq_schedule.metadata.json new file mode 100644 index 0000000..036d141 --- /dev/null +++ b/griffin-doc/db/unitdb0/dq_schedule.metadata.json @@ -0,0 +1 @@ +{"options":{},"indexes":[{"v":1,"key":{"_id":1},"name":"_id_","ns":"unitdb0.dq_schedule"}]} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/system.indexes.bson ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/system.indexes.bson b/griffin-doc/db/unitdb0/system.indexes.bson new file mode 100644 index 0000000..973ade6 Binary files /dev/null and b/griffin-doc/db/unitdb0/system.indexes.bson differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/user_subscribe.bson ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/user_subscribe.bson b/griffin-doc/db/unitdb0/user_subscribe.bson new file mode 100644 index 0000000..c00ae2b Binary files /dev/null and b/griffin-doc/db/unitdb0/user_subscribe.bson differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/db/unitdb0/user_subscribe.metadata.json ---------------------------------------------------------------------- diff --git a/griffin-doc/db/unitdb0/user_subscribe.metadata.json b/griffin-doc/db/unitdb0/user_subscribe.metadata.json new file mode 100644 index 0000000..9cf671a --- /dev/null +++ b/griffin-doc/db/unitdb0/user_subscribe.metadata.json @@ -0,0 +1 @@ +{"options":{},"indexes":[{"v":1,"key":{"_id":1},"name":"_id_","ns":"unitdb0.user_subscribe"}]} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/dockerUIguide.md ---------------------------------------------------------------------- diff --git a/griffin-doc/dockerUIguide.md b/griffin-doc/dockerUIguide.md new file mode 100644 index 0000000..24d76e5 --- /dev/null +++ b/griffin-doc/dockerUIguide.md @@ -0,0 +1,63 @@ +## Docker webUI Guide + +### Preparatory work + +Follow the steps [here](https://github.com/eBay/DQSolution/blob/master/README.md#how-to-run-in-docker), prepare your docker container of griffin, and get your webUI ready. + +### webUI test case guide + +1. Click "Data Assets" at the top right corner, to watch all the exist data assets. + +2. Click "Register Data Asset" button at the top left corner, fill out the "Required Information" table as the following data, then submit and save to finish the creation of a new data asset. + ``` + Asset Name: users_info_src + Asset Type: hivetable + HDFS Path: /user/hive/warehouse/users_info_src + Organization: <any> + Schema: user_id bigint + first_name string + last_name string + address string + email string + phone string + post_code string + ``` + The data asset "users_info_src" has been prepared in our docker image already, and the information is shown above. + "Asset Name" item needs to be the same with the Hive table name, "HDFS Path" is exactly the path in HDFS, they should be filled as real. + "Asset Type" item has only one selection "hivetable" at current, and you need to choose it, while the "Organization" item could be set as any one you like. + "Schema" item lists the schema of the data asset, the names and types are better to be exactly the same with the hive table while it's not required now, but the number and order of schema items need to be. + + Repeat the above step, create another new data asset by filling out as following. + ``` + Asset Name: users_info_target + Asset Type: hivetable + HDFS Path: /user/hive/warehouse/users_info_target + Organization: <any> + Schema: user_id bigint + first_name string + last_name string + address string + email string + phone string + post_code string + ``` + "users_info_target" is also prepared in our docker image with the information above. + If you want to test your own data assets, it's necessary to put them into Hive in the docker container first. + +3. Click "Models" at the top left corner to watch all the models here, now there has been two new models named "TotalCount_users_info_src" and "TotalCount_users_info_target" created automatically by the new data asset creation. + You can create a new accuracy model for the two new data assets registered just now. + Click "Create DQ Model" button at the top left corner, choose the top left block "Accuracy", follow the steps below. + 1) Choose Source: find "users_info_src" in the left tree, select some or all attributes in the right block, click "Next". + 2) Choose Target: find "users_info_target" in the left tree, select the matching attributes with previous ones in the right block, click "Next". + 3) Mapping Source and Target: choose the first row "user_id" as "PK" which means "Primary Key", and select "Source Fields" of each row, to match the same item in source table, e.g. user_id maps to user_id, first_name maps to first_name. + Finish all the mapping, click "Next". + 4) Fill out the required table freely, "Schedule Type" is the calculation period. + Submit and save, you can see your new DQ model created in the models list. + +4. Now you've created two data assets and three DQ models, the models are calculated automatically at background in the docker container. + Wait for about 20 minutes, results would be published to web UI. Then you can see the dashboards of your new models in "My Dashboard" page. + View the accuracy model, there will be a "Deploy" button when the result comes out, click "Deploy" button to enable the periodically calculation of it, then you can get your dashboard growing by the period as you set. + +### User data case guide + +You can follow the steps [here](https://github.com/eBay/griffin/blob/master/griffin-doc/userDataCaseGuide.md) to use your own data for test. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/hive/json/accu_config.json ---------------------------------------------------------------------- diff --git a/griffin-doc/hive/json/accu_config.json b/griffin-doc/hive/json/accu_config.json new file mode 100644 index 0000000..f18989b --- /dev/null +++ b/griffin-doc/hive/json/accu_config.json @@ -0,0 +1,62 @@ +{ + "source": "users_info_src", + "target": "users_info_target", + "accuracyMapping": [ + { + "sourceColId": 0, + "sourceColName": "user_id", + "targetColId": 0, + "targetColName": "user_id", + "matchFunction": "true", + "isPK": true + }, + { + "sourceColId": 1, + "sourceColName": "first_name", + "targetColId": 1, + "targetColName": "first_name", + "matchFunction": "false", + "isPK": false + }, + { + "sourceColId": 2, + "sourceColName": "last_name", + "targetColId": 2, + "targetColName": "last_name", + "matchFunction": "false", + "isPK": false + }, + { + "sourceColId": 3, + "sourceColName": "address", + "targetColId": 3, + "targetColName": "address", + "matchFunction": "false", + "isPK": false + }, + { + "sourceColId": 4, + "sourceColName": "email", + "targetColId": 4, + "targetColName": "email", + "matchFunction": "false", + "isPK": false + }, + { + "sourceColId": 5, + "sourceColName": "phone", + "targetColId": 5, + "targetColName": "phone", + "matchFunction": "false", + "isPK": false + }, + { + "sourceColId": 6, + "sourceColName": "post_code", + "targetColId": 6, + "targetColName": "post_code", + "matchFunction": "false", + "isPK": false + } + ] +} http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/hive/json/vali_config.json ---------------------------------------------------------------------- diff --git a/griffin-doc/hive/json/vali_config.json b/griffin-doc/hive/json/vali_config.json new file mode 100644 index 0000000..a07c4cd --- /dev/null +++ b/griffin-doc/hive/json/vali_config.json @@ -0,0 +1,131 @@ +{ + "dataSet": "users_info_src", + "validityReq": [ + { + "colId": 0, + "colName": "user_id", + "metrics": [ + { + "name": 1 + }, + { + "name": 2 + }, + { + "name": 3 + }, + { + "name": 4 + } + ] + }, + { + "colId": 1, + "colName": "first_name", + "metrics": [ + { + "name": 1 + }, + { + "name": 2 + }, + { + "name": 3 + }, + { + "name": 4 + } + ] + }, + { + "colId": 2, + "colName": "last_name", + "metrics": [ + { + "name": 1 + }, + { + "name": 2 + }, + { + "name": 3 + }, + { + "name": 4 + } + ] + }, + { + "colId": 3, + "colName": "address", + "metrics": [ + { + "name": 1 + }, + { + "name": 2 + }, + { + "name": 3 + }, + { + "name": 4 + } + ] + }, + { + "colId": 4, + "colName": "email", + "metrics": [ + { + "name": 1 + }, + { + "name": 2 + }, + { + "name": 3 + }, + { + "name": 4 + } + ] + }, + { + "colId": 5, + "colName": "phone", + "metrics": [ + { + "name": 1 + }, + { + "name": 2 + }, + { + "name": 3 + }, + { + "name": 4 + } + ] + }, + { + "colId": 6, + "colName": "post_code", + "metrics": [ + { + "name": 1 + }, + { + "name": 2 + }, + { + "name": 3 + }, + { + "name": 4 + } + ] + } + ] +} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/hive/script/bark_jobs.sh ---------------------------------------------------------------------- diff --git a/griffin-doc/hive/script/bark_jobs.sh b/griffin-doc/hive/script/bark_jobs.sh new file mode 100644 index 0000000..9ecade7 --- /dev/null +++ b/griffin-doc/hive/script/bark_jobs.sh @@ -0,0 +1,70 @@ +#!/bin/bash + +ROOT_DIR=$(cd $(dirname $0); pwd) +if [ -f $ROOT_DIR/env.sh ]; then + . $ROOT_DIR/env.sh +fi + +HDFS_WORKDIR=${HDFS_WORKDIR:-/user/bark/running} +TEMP_DIR=${TEMP_DIR:-$ROOT_DIR/temp} +LOG_DIR=${LOG_DIR:-$ROOT_DIR/log} + +mkdir -p $TEMP_DIR +mkdir -p $LOG_DIR + +lv1tempfile=$TEMP_DIR/temp.txt +lv2tempfile=$TEMP_DIR/temp2.txt +logfile=$LOG_DIR/log.txt + +set +e + +hadoop fs -ls $HDFS_WORKDIR > $lv1tempfile + +rm -rf $logfile +touch $logfile + +while read line +do + lv1dir=${line##* } + echo $lv1dir + hadoop fs -test -f $lv1dir/_START + if [ $? -ne 0 ] && [ "${lv1dir:0:1}" == "/" ] + then + hadoop fs -cat $lv1dir/_watchfile > $lv2tempfile + + watchfiledone=1 + while read watchline + do + echo $watchline >> $logfile + hadoop fs -test -f $watchline/_SUCCESS + if [ $? -ne 0 ] + then + watchfiledone=0 + fi + done < $lv2tempfile + + if [ $watchfiledone -eq 1 ] + then + hadoop fs -touchz $lv1dir/_START + hadoop fs -test -f $lv1dir/_type_0.done + rc1=$? + hadoop fs -test -f $lv1dir/_type_1.done + rc2=$? + if [ $rc1 -eq 0 ] + then + echo "spark-submit --class com.ebay.bark.Accu33 --master yarn-client --queue default --executor-memory 512m --num-executors 10 accuracy-1.0-SNAPSHOT.jar $lv1dir/cmd.txt $lv1dir/ " + spark-submit --class com.ebay.bark.Accu33 --master yarn-client --queue default --executor-memory 512m --num-executors 10 bark-models-0.0.1-SNAPSHOT.jar $lv1dir/cmd.txt $lv1dir/ + elif [ $rc2 -eq 0 ] + then + echo "spark-submit --class com.ebay.bark.Vali3 --master yarn-client --queue default --executor-memory 512m --num-executors 10 accuracy-1.0-SNAPSHOT.jar $lv1dir/cmd.txt $lv1dir/ " + spark-submit --class com.ebay.bark.Vali3 --master yarn-client --queue default --executor-memory 512m --num-executors 10 bark-models-0.0.1-SNAPSHOT.jar $lv1dir/cmd.txt $lv1dir/ + fi + + echo "watch file ready" >> $logfile + exit + fi + fi + +done < $lv1tempfile + +set -e \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/hive/script/bark_regular_run.sh ---------------------------------------------------------------------- diff --git a/griffin-doc/hive/script/bark_regular_run.sh b/griffin-doc/hive/script/bark_regular_run.sh new file mode 100644 index 0000000..7a20b9c --- /dev/null +++ b/griffin-doc/hive/script/bark_regular_run.sh @@ -0,0 +1,15 @@ +#!/bin/bash + +ROOT_DIR=$(cd $(dirname $0); pwd) + +set +e +while true +do + echo "start" + $ROOT_DIR/bark_jobs.sh 2>&1 + rcode=$? + echo "end $rcode" + rm -rf $ROOT_DIR/nohup.out + sleep 60 +done +set -e \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/hive/script/env.sh ---------------------------------------------------------------------- diff --git a/griffin-doc/hive/script/env.sh b/griffin-doc/hive/script/env.sh new file mode 100644 index 0000000..d694dec --- /dev/null +++ b/griffin-doc/hive/script/env.sh @@ -0,0 +1,3 @@ +#!/bin/bash + +HDFS_WORKDIR=/user/bark/running \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/hive/users_info_src.dat ---------------------------------------------------------------------- diff --git a/griffin-doc/hive/users_info_src.dat b/griffin-doc/hive/users_info_src.dat new file mode 100644 index 0000000..ce49443 --- /dev/null +++ b/griffin-doc/hive/users_info_src.dat @@ -0,0 +1,50 @@ +10001|Tom001|Jerrya|201 DisneyCity|tomajerrya...@dc.org|10000001|94022 +10002|Tom002|Jerrya|202 DisneyCity|tomajerrya...@dc.org|10000002|94022 +10003|Tom003|Jerrya|203 DisneyCity|tomajerrya...@dc.org|10000003|94022 +10004|Tom004|Jerrya|204 DisneyCity|tomajerrya...@dc.org|10000004|94022 +10005|Tom005|Jerrya|205 DisneyCity|tomajerrya...@dc.org|10000005|94022 +10006|Tom006|Jerrya|206 DisneyCity|tomajerrya...@dc.org|10000006|94022 +10007|Tom007|Jerrya|207 DisneyCity|tomajerrya...@dc.org|10000007|94022 +10008|Tom008|Jerrya|208 DisneyCity|tomajerrya...@dc.org|10000008|94022 +10009|Tom009|Jerrya|209 DisneyCity|tomajerrya...@dc.org|10000009|94022 +10010|Tom010|Jerrya|210 DisneyCity|tomajerrya...@dc.org|10000010|94022 +10011|Tom011|Jerrya|211 DisneyCity|tomajerrya...@dc.org|10000011|94022 +10012|Tom012|Jerrya|212 DisneyCity|tomajerrya...@dc.org|10000012|94022 +10013|Tom013|Jerrya|213 DisneyCity|tomajerrya...@dc.org|10000013|94022 +10014|Tom014|Jerrya|214 DisneyCity|tomajerrya...@dc.org|10000014|94022 +10015|Tom015|Jerrya|215 DisneyCity|tomajerrya...@dc.org|10000015|94022 +10016|Tom016|Jerrya|216 DisneyCity|tomajerrya...@dc.org|10000016|94022 +10017|Tom017|Jerrya|217 DisneyCity|tomajerrya...@dc.org|10000017|94022 +10018|Tom018|Jerrya|218 DisneyCity|tomajerrya...@dc.org|10000018|94022 +10019|Tom019|Jerrya|219 DisneyCity|tomajerrya...@dc.org|10000019|94022 +10020|Tom020|Jerrya|220 DisneyCity|tomajerrya...@dc.org|10000020|94022 +10021|Tom021|Jerrya|221 DisneyCity|tomajerrya...@dc.org|10000021|94022 +10022|Tom022|Jerrya|222 DisneyCity|tomajerrya...@dc.org|10000022|94022 +10023|Tom023|Jerrya|223 DisneyCity|tomajerrya...@dc.org|10000023|94022 +10024|Tom024|Jerrya|224 DisneyCity|tomajerrya...@dc.org|10000024|94022 +10025|Tom025|Jerrya|225 DisneyCity|tomajerrya...@dc.org|10000025|94022 +10026|Tom026|Jerrya|226 DisneyCity|tomajerrya...@dc.org|10000026|94022 +10027|Tom027|Jerrya|227 DisneyCity|tomajerrya...@dc.org|10000027|94022 +10028|Tom028|Jerrya|228 DisneyCity|tomajerrya...@dc.org|10000028|94022 +10029|Tom029|Jerrya|229 DisneyCity|tomajerrya...@dc.org|10000029|94022 +10030|Tom030|Jerrya|230 DisneyCity|tomajerrya...@dc.org|10000030|94022 +10031|Tom031|Jerrya|231 DisneyCity|tomajerrya...@dc.org|10000031|94022 +10032|Tom032|Jerrya|232 DisneyCity|tomajerrya...@dc.org|10000032|94022 +10033|Tom033|Jerrya|233 DisneyCity|tomajerrya...@dc.org|10000033|94022 +10034|Tom034|Jerrya|234 DisneyCity|tomajerrya...@dc.org|10000034|94022 +10035|Tom035|Jerrya|235 DisneyCity|tomajerrya...@dc.org|10000035|94022 +10036|Tom036|Jerrya|236 DisneyCity|tomajerrya...@dc.org|10000036|94022 +10037|Tom037|Jerrya|237 DisneyCity|tomajerrya...@dc.org|10000037|94022 +10038|Tom038|Jerrya|238 DisneyCity|tomajerrya...@dc.org|10000038|94022 +10039|Tom039|Jerrya|239 DisneyCity|tomajerrya...@dc.org|10000039|94022 +10040|Tom040|Jerrya|240 DisneyCity|tomajerrya...@dc.org|10000040|94022 +10041|Tom041|Jerrya|241 DisneyCity|tomajerrya...@dc.org|10000041|94022 +10042|Tom042|Jerrya|242 DisneyCity|tomajerrya...@dc.org|10000042|94022 +10043|Tom043|Jerrya|243 DisneyCity|tomajerrya...@dc.org|10000043|94022 +10044|Tom044|Jerrya|244 DisneyCity|tomajerrya...@dc.org|10000044|94022 +10045|Tom045|Jerrya|245 DisneyCity|tomajerrya...@dc.org|10000045|94022 +10046|Tom046|Jerrya|246 DisneyCity|tomajerrya...@dc.org|10000046|94022 +10047|Tom047|Jerrya|247 DisneyCity|tomajerrya...@dc.org|10000047|94022 +10048|Tom048|Jerrya|248 DisneyCity|tomajerrya...@dc.org|10000048|94022 +10049|Tom049|Jerrya|249 DisneyCity|tomajerrya...@dc.org|10000049|94022 +10050|Tom050|Jerrya|250 DisneyCity|tomajerrya...@dc.org|10000050|94022 \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/hive/users_info_target.dat ---------------------------------------------------------------------- diff --git a/griffin-doc/hive/users_info_target.dat b/griffin-doc/hive/users_info_target.dat new file mode 100644 index 0000000..07a6b40 --- /dev/null +++ b/griffin-doc/hive/users_info_target.dat @@ -0,0 +1,50 @@ +10001|Tom001|Jerrya|201 DisneyCity|tomajerrya...@dc.org|10000101|94022 +10002|Tom002|Jerrya|202 DisneyCity|tomajerrya...@dc.org|10000102|94022 +10003|Tom003|Jerrya|203 DisneyCity|tomajerrya...@dc.org|10000003|94022 +10004|Tom004|Jerrya|204 DisneyCity|tomajerrya...@dc.org|10000004|94022 +10005|Tom005|Jerrya|205 DisneyCity|tomajerrya...@dc.org|10000005|94022 +10006|Tom006|Jerrya|206 DisneyCity|tomajerrya...@dc.org|10000006|94022 +10007|Tom007|Jerrya|207 DisneyCity|tomajerrya...@dc.org|10000007|94022 +10008|Tom008|Jerrya|208 DisneyCity|tomajerrya...@dc.org|10000008|94022 +10009|Tom009|Jerrya|209 DisneyCity|tomajerrya...@dc.org|10000009|94022 +10010|Tom010|Jerrya|210 DisneyCity|tomajerrya...@dc.org|10000010|94022 +10011|Tom011|Jerrya|211 DisneyCity|tomajerrya...@dc.org|10000011|94022 +10012|Tom012|Jerrya|212 DisneyCity|tomajerrya...@dc.org|10000012|94022 +10013|Tom013|Jerrya|213 DisneyCity|tomajerrya...@dc.org|10000013|94022 +10014|Tom014|Jerrya|214 DisneyCity|tomajerrya...@dc.org|10000014|94022 +10015|Tom015|Jerrya|215 DisneyCity|tomajerrya...@dc.org|10000015|94022 +10016|Tom016|Jerrya|216 DisneyCity|tomajerrya...@dc.org|10000016|94022 +10017|Tom017|Jerrya|217 DisneyCity|tomajerrya...@dc.org|10000017|94022 +10018|Tom018|Jerrya|218 DisneyCity|tomajerrya...@dc.org|10000018|94022 +10019|Tom019|Jerrya|219 DisneyCity|tomajerrya...@dc.org|10000019|94022 +10020|Tom020|Jerrya|220 DisneyCity|tomajerrya...@dc.org|10000020|94022 +10021|Tom021|Jerrya|221 DisneyCity|tomajerrya...@dc.org|10000021|94022 +10022|Tom022|Jerrya|222 DisneyCity|tomajerrya...@dc.org|10000022|94022 +10023|Tom023|Jerrya|223 DisneyCity|tomajerrya...@dc.org|10000023|94022 +10024|Tom024|Jerrya|224 DisneyCity|tomajerrya...@dc.org|10000024|94022 +10025|Tom025|Jerrya|225 DisneyCity|tomajerrya...@dc.org|10000025|94022 +10026|Tom026|Jerrya|226 DisneyCity|tomajerrya...@dc.org|10000026|94022 +10027|Tom027|Jerrya|227 DisneyCity|tomajerrya...@dc.org|10000027|94022 +10028|Tom028|Jerrya|228 DisneyCity|tomajerrya...@dc.org|10000028|94022 +10029|Tom029|Jerrya|229 DisneyCity|tomajerrya...@dc.org|10000029|94022 +10030|Tom030|Jerrya|230 DisneyCity|tomajerrya...@dc.org|10000030|94022 +10031|Tom031|Jerrya|231 DisneyCity|tomajerrya...@dc.org|10000031|94022 +10032|Tom032|Jerrya|232 DisneyCity|tomajerrya...@dc.org|10000032|94022 +10033|Tom033|Jerrya|233 DisneyCity|tomajerrya...@dc.org|10000033|94022 +10034|Tom034|Jerrya|234 DisneyCity|tomajerrya...@dc.org|10000034|94022 +10035|Tom035|Jerrya|235 DisneyCity|tomajerrya...@dc.org|10000035|94022 +10036|Tom036|Jerrya|236 DisneyCity|tomajerrya...@dc.org|10000036|94022 +10037|Tom037|Jerrya|237 DisneyCity|tomajerrya...@dc.org|10000037|94022 +10038|Tom038|Jerrya|238 DisneyCity|tomajerrya...@dc.org|10000038|94022 +10039|Tom039|Jerrya|239 DisneyCity|tomajerrya...@dc.org|10000039|94022 +10040|Tom040|Jerrya|240 DisneyCity|tomajerrya...@dc.org|10000040|94022 +10041|Tom041|Jerrya|241 DisneyCity|tomajerrya...@dc.org|10000041|94022 +10042|Tom042|Jerrya|242 DisneyCity|tomajerrya...@dc.org|10000042|94022 +10043|Tom043|Jerrya|243 DisneyCity|tomajerrya...@dc.org|10000043|94022 +10044|Tom044|Jerrya|244 DisneyCity|tomajerrya...@dc.org|10000044|94022 +10045|Tom045|Jerrya|245 DisneyCity|tomajerrya...@dc.org|10000045|94022 +10046|Tom046|Jerrya|246 DisneyCity|tomajerrya...@dc.org|10000046|94022 +10047|Tom047|Jerrya|247 DisneyCity|tomajerrya...@dc.org|10000047|94022 +10048|Tom048|Jerrya|248 DisneyCity|tomajerrya...@dc.org|10000048|94022 +10049|Tom049|Jerrya|249 DisneyCity|tomajerrya...@dc.org|10000049|94022 +10050|Tom050|Jerrya|250 DisneyCity|tomajerrya...@dc.org|10000050|94022 \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/Business_Process.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/Business_Process.png b/griffin-doc/img/Business_Process.png new file mode 100644 index 0000000..ff0f25f Binary files /dev/null and b/griffin-doc/img/Business_Process.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/bark-sm.gif ---------------------------------------------------------------------- diff --git a/griffin-doc/img/bark-sm.gif b/griffin-doc/img/bark-sm.gif new file mode 100644 index 0000000..04c9d09 Binary files /dev/null and b/griffin-doc/img/bark-sm.gif differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/data quality Flow.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/data quality Flow.png b/griffin-doc/img/data quality Flow.png new file mode 100644 index 0000000..79d5c8a Binary files /dev/null and b/griffin-doc/img/data quality Flow.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 13-19-2.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 13-19-2.png b/griffin-doc/img/fsd/image2016-6-30 13-19-2.png new file mode 100644 index 0000000..01c55dc Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 13-19-2.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-16-21.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-16-21.png b/griffin-doc/img/fsd/image2016-6-30 16-16-21.png new file mode 100644 index 0000000..cb142e3 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-16-21.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-17-5.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-17-5.png b/griffin-doc/img/fsd/image2016-6-30 16-17-5.png new file mode 100644 index 0000000..c76db3a Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-17-5.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-17-52.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-17-52.png b/griffin-doc/img/fsd/image2016-6-30 16-17-52.png new file mode 100644 index 0000000..2f2de84 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-17-52.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-18-20.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-18-20.png b/griffin-doc/img/fsd/image2016-6-30 16-18-20.png new file mode 100644 index 0000000..47cef7e Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-18-20.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-18-52.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-18-52.png b/griffin-doc/img/fsd/image2016-6-30 16-18-52.png new file mode 100644 index 0000000..62e29fc Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-18-52.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-20-34.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-20-34.png b/griffin-doc/img/fsd/image2016-6-30 16-20-34.png new file mode 100644 index 0000000..a644d74 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-20-34.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-20-53.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-20-53.png b/griffin-doc/img/fsd/image2016-6-30 16-20-53.png new file mode 100644 index 0000000..b4b4aaf Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-20-53.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-21-16.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-21-16.png b/griffin-doc/img/fsd/image2016-6-30 16-21-16.png new file mode 100644 index 0000000..c5d214e Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-21-16.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-21-49.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-21-49.png b/griffin-doc/img/fsd/image2016-6-30 16-21-49.png new file mode 100644 index 0000000..e644a60 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-21-49.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-22-53.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-22-53.png b/griffin-doc/img/fsd/image2016-6-30 16-22-53.png new file mode 100644 index 0000000..23246e9 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-22-53.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-23-11.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-23-11.png b/griffin-doc/img/fsd/image2016-6-30 16-23-11.png new file mode 100644 index 0000000..5b52f7b Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-23-11.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-23-32.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-23-32.png b/griffin-doc/img/fsd/image2016-6-30 16-23-32.png new file mode 100644 index 0000000..a06be23 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-23-32.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-24-7.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-24-7.png b/griffin-doc/img/fsd/image2016-6-30 16-24-7.png new file mode 100644 index 0000000..b3985bf Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-24-7.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-25-12.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-25-12.png b/griffin-doc/img/fsd/image2016-6-30 16-25-12.png new file mode 100644 index 0000000..4762c02 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-25-12.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-25-42.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-25-42.png b/griffin-doc/img/fsd/image2016-6-30 16-25-42.png new file mode 100644 index 0000000..f7d2662 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-25-42.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-31-26.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-31-26.png b/griffin-doc/img/fsd/image2016-6-30 16-31-26.png new file mode 100644 index 0000000..a69d99f Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-31-26.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-33-44.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-33-44.png b/griffin-doc/img/fsd/image2016-6-30 16-33-44.png new file mode 100644 index 0000000..b77ff6a Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-33-44.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-34-58.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-34-58.png b/griffin-doc/img/fsd/image2016-6-30 16-34-58.png new file mode 100644 index 0000000..340c6ad Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-34-58.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-35-18.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-35-18.png b/griffin-doc/img/fsd/image2016-6-30 16-35-18.png new file mode 100644 index 0000000..b992258 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-35-18.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-35-57.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-35-57.png b/griffin-doc/img/fsd/image2016-6-30 16-35-57.png new file mode 100644 index 0000000..8dbb96e Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-35-57.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-36-48.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-36-48.png b/griffin-doc/img/fsd/image2016-6-30 16-36-48.png new file mode 100644 index 0000000..6156fcb Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-36-48.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-37-48.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-37-48.png b/griffin-doc/img/fsd/image2016-6-30 16-37-48.png new file mode 100644 index 0000000..e2645b2 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-37-48.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-39-17.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-39-17.png b/griffin-doc/img/fsd/image2016-6-30 16-39-17.png new file mode 100644 index 0000000..7c60136 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-39-17.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-39-37.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-39-37.png b/griffin-doc/img/fsd/image2016-6-30 16-39-37.png new file mode 100644 index 0000000..2382b23 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-39-37.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-40-14.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-40-14.png b/griffin-doc/img/fsd/image2016-6-30 16-40-14.png new file mode 100644 index 0000000..fec613e Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-40-14.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-41-5.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-41-5.png b/griffin-doc/img/fsd/image2016-6-30 16-41-5.png new file mode 100644 index 0000000..d3381b1 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-41-5.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-41-57.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-41-57.png b/griffin-doc/img/fsd/image2016-6-30 16-41-57.png new file mode 100644 index 0000000..737e6fc Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-41-57.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-42-16.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-42-16.png b/griffin-doc/img/fsd/image2016-6-30 16-42-16.png new file mode 100644 index 0000000..048b5db Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-42-16.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/fsd/image2016-6-30 16-44-15.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/fsd/image2016-6-30 16-44-15.png b/griffin-doc/img/fsd/image2016-6-30 16-44-15.png new file mode 100644 index 0000000..f70a229 Binary files /dev/null and b/griffin-doc/img/fsd/image2016-6-30 16-44-15.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/tdd/arch_design.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/tdd/arch_design.png b/griffin-doc/img/tdd/arch_design.png new file mode 100644 index 0000000..6170830 Binary files /dev/null and b/griffin-doc/img/tdd/arch_design.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/tdd/class_diagram.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/tdd/class_diagram.png b/griffin-doc/img/tdd/class_diagram.png new file mode 100644 index 0000000..6a6352e Binary files /dev/null and b/griffin-doc/img/tdd/class_diagram.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/tdd/model_design.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/tdd/model_design.png b/griffin-doc/img/tdd/model_design.png new file mode 100644 index 0000000..2d18486 Binary files /dev/null and b/griffin-doc/img/tdd/model_design.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/tdd/mvc.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/tdd/mvc.png b/griffin-doc/img/tdd/mvc.png new file mode 100644 index 0000000..381da82 Binary files /dev/null and b/griffin-doc/img/tdd/mvc.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/1.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/1.PNG b/griffin-doc/img/userguide/1.PNG new file mode 100644 index 0000000..776a16d Binary files /dev/null and b/griffin-doc/img/userguide/1.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/13.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/13.PNG b/griffin-doc/img/userguide/13.PNG new file mode 100644 index 0000000..24dbcc1 Binary files /dev/null and b/griffin-doc/img/userguide/13.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/23.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/23.PNG b/griffin-doc/img/userguide/23.PNG new file mode 100644 index 0000000..5488c83 Binary files /dev/null and b/griffin-doc/img/userguide/23.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/33.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/33.PNG b/griffin-doc/img/userguide/33.PNG new file mode 100644 index 0000000..1a5867b Binary files /dev/null and b/griffin-doc/img/userguide/33.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/333.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/333.PNG b/griffin-doc/img/userguide/333.PNG new file mode 100644 index 0000000..b5bf664 Binary files /dev/null and b/griffin-doc/img/userguide/333.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/Capture.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/Capture.PNG b/griffin-doc/img/userguide/Capture.PNG new file mode 100644 index 0000000..ace346d Binary files /dev/null and b/griffin-doc/img/userguide/Capture.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/Capwture.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/Capwture.PNG b/griffin-doc/img/userguide/Capwture.PNG new file mode 100644 index 0000000..2382b23 Binary files /dev/null and b/griffin-doc/img/userguide/Capwture.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/DQ metirics.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/DQ metirics.PNG b/griffin-doc/img/userguide/DQ metirics.PNG new file mode 100644 index 0000000..048b5db Binary files /dev/null and b/griffin-doc/img/userguide/DQ metirics.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/accurancy.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/accurancy.PNG b/griffin-doc/img/userguide/accurancy.PNG new file mode 100644 index 0000000..fef61ec Binary files /dev/null and b/griffin-doc/img/userguide/accurancy.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/anomaly .PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/anomaly .PNG b/griffin-doc/img/userguide/anomaly .PNG new file mode 100644 index 0000000..67505b7 Binary files /dev/null and b/griffin-doc/img/userguide/anomaly .PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/asset.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/asset.PNG b/griffin-doc/img/userguide/asset.PNG new file mode 100644 index 0000000..0ba58cc Binary files /dev/null and b/griffin-doc/img/userguide/asset.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/bullseye.png ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/bullseye.png b/griffin-doc/img/userguide/bullseye.png new file mode 100644 index 0000000..737e6fc Binary files /dev/null and b/griffin-doc/img/userguide/bullseye.png differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/confirm.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/confirm.PNG b/griffin-doc/img/userguide/confirm.PNG new file mode 100644 index 0000000..5f24869 Binary files /dev/null and b/griffin-doc/img/userguide/confirm.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/create model.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/create model.PNG b/griffin-doc/img/userguide/create model.PNG new file mode 100644 index 0000000..d5b9cb5 Binary files /dev/null and b/griffin-doc/img/userguide/create model.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/data asset.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/data asset.PNG b/griffin-doc/img/userguide/data asset.PNG new file mode 100644 index 0000000..fec613e Binary files /dev/null and b/griffin-doc/img/userguide/data asset.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/download.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/download.PNG b/griffin-doc/img/userguide/download.PNG new file mode 100644 index 0000000..758c8e6 Binary files /dev/null and b/griffin-doc/img/userguide/download.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/log in.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/log in.PNG b/griffin-doc/img/userguide/log in.PNG new file mode 100644 index 0000000..680fc1f Binary files /dev/null and b/griffin-doc/img/userguide/log in.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/my dashboard.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/my dashboard.PNG b/griffin-doc/img/userguide/my dashboard.PNG new file mode 100644 index 0000000..d88e342 Binary files /dev/null and b/griffin-doc/img/userguide/my dashboard.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/p.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/p.PNG b/griffin-doc/img/userguide/p.PNG new file mode 100644 index 0000000..45e7faf Binary files /dev/null and b/griffin-doc/img/userguide/p.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/sample.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/sample.PNG b/griffin-doc/img/userguide/sample.PNG new file mode 100644 index 0000000..88ab3b3 Binary files /dev/null and b/griffin-doc/img/userguide/sample.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/side.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/side.PNG b/griffin-doc/img/userguide/side.PNG new file mode 100644 index 0000000..776a16d Binary files /dev/null and b/griffin-doc/img/userguide/side.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/source.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/source.PNG b/griffin-doc/img/userguide/source.PNG new file mode 100644 index 0000000..805afbe Binary files /dev/null and b/griffin-doc/img/userguide/source.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/source2.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/source2.PNG b/griffin-doc/img/userguide/source2.PNG new file mode 100644 index 0000000..215393e Binary files /dev/null and b/griffin-doc/img/userguide/source2.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/subscribe.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/subscribe.PNG b/griffin-doc/img/userguide/subscribe.PNG new file mode 100644 index 0000000..0684c47 Binary files /dev/null and b/griffin-doc/img/userguide/subscribe.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/subscriberesult.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/subscriberesult.PNG b/griffin-doc/img/userguide/subscriberesult.PNG new file mode 100644 index 0000000..b5a5aca Binary files /dev/null and b/griffin-doc/img/userguide/subscriberesult.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/target.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/target.PNG b/griffin-doc/img/userguide/target.PNG new file mode 100644 index 0000000..a5b9cff Binary files /dev/null and b/griffin-doc/img/userguide/target.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/img/userguide/validity.PNG ---------------------------------------------------------------------- diff --git a/griffin-doc/img/userguide/validity.PNG b/griffin-doc/img/userguide/validity.PNG new file mode 100644 index 0000000..a1fc0e9 Binary files /dev/null and b/griffin-doc/img/userguide/validity.PNG differ http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/intro.md ---------------------------------------------------------------------- diff --git a/griffin-doc/intro.md b/griffin-doc/intro.md new file mode 100644 index 0000000..f7ad4ea --- /dev/null +++ b/griffin-doc/intro.md @@ -0,0 +1,68 @@ + +## Abstract +Griffin is a Data Quality Service platform built on Apache Hadoop and Apache Spark. It provides a framework process for defining data quality model, executing data quality measurement, automating data profiling and validation, as well as a unified data quality visualization across multiple data systems. It tries to address the data quality challenges in big data and streaming context. + + +## Overview of Griffin +At eBay, when people use big data (Hadoop or other streaming systems), measurement of data quality is a big challenge. Different teams have built customized tools to detect and analyze data quality issues within their own domains. As a platform organization, we think of taking a platform approach to commonly occurring patterns. As such, we are building a platform to provide shared Infrastructure and generic features to solve common data quality pain points. This would enable us to build trusted data assets. + +Currently it is very difficult and costly to do data quality validation when we have large volumes of related data flowing across multi-platforms (streaming and batch). Take eBay's Real-time Personalization Platform as a sample; Everyday we have to validate the data quality for ~600M records. Data quality often becomes one big challenge in this complex environment and massive scale. + +We detect the following at eBay: + +1. Lack of an end-to-end, unified view of data quality from multiple data sources to target applications that takes into account the lineage of the data. This results in a long time to identify and fix data quality issues. +2. Lack of a system to measure data quality in streaming mode through self-service. The need is for a system where datasets can be registered, data quality models can be defined, data quality can be visualized and monitored using a simple tool and teams alerted when an issue is detected. +3. Lack of a Shared platform and API Service. Every team should not have to apply and manage own hardware and software infrastructure to solve this common problem. + +With these in mind, we decided to build Griffin - A data quality service that aims to solve the above short-comings. + +Griffin includes: + +**Data Quality Model Engine**: Griffin is model driven solution, user can choose various data quality dimension to execute his/her data quality validation based on selected target data-set or source data-set ( as the golden reference data). It has corresponding library supporting it in back-end for the following measurement: + + - Accuracy - Does data reflect the real-world objects or a verifiable source + - Completeness - Is all necessary data present + - Validity - Are all data values within the data domains specified by the business + - Timeliness - Is the data available at the time needed + - Anomaly detection - Pre-built algorithm functions for the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset + - Data Profiling - Apply statistical analysis and assessment of data values within a dataset for consistency, uniqueness and logic. + +**Data Collection Layer**: + +We support two kinds of data sources, batch data and real time data. + +For batch mode, we can collect data source from our Hadoop platform by various data connectors. + +For real time mode, we can connect with messaging system like Kafka to near real time analysis. + +**Data Process and Storage Layer**: + +For batch analysis, our data quality model will compute data quality metrics in our spark cluster based on data source in hadoop. + +For near real time analysis, we consume data from messaging system, then our data quality model will compute our real time data quality metrics in our spark cluster. for data storage, we use time series database in our back end to fulfill front end request. + +**Griffin Service**: + +We have RESTful web services to accomplish all the functionalities of Griffin, such as register data-set, create data quality model, publish metrics, retrieve metrics, add subscription, etc. So, the developers can develop their own user interface based on these web serivces. + +## Main business process +Here's the business process diagram + +![Business_Process_image](img/Business_Process.png) + +## Rationale +The challenge we face at eBay is that our data volume is becoming bigger and bigger, systems process become more complex, while we do not have a unified data quality solution to ensure the trusted data sets which provide confidences on data quality to our data consumers. The key challenges on data quality includes: + +1. Existing commercial data quality solution cannot address data quality lineage among systems, cannot scale out to support fast growing data at eBay +2. Existing eBay's domain specific tools take a long time to identify and fix poor data quality when data flowed through multiple systems +3. Business logic becomes complex, requires data quality system much flexible. +4. Some data quality issues do have business impact on user experiences, revenue, efficiency & compliance. +5. Communication overhead of data quality metrics, typically in a big organization, which involve different teams. + +The idea of Griffin is to provide Data Quality validation as a Service, to allow data engineers and data consumers to have: + + - Near real-time understanding of the data quality health of your data pipelines with end-to-end monitoring, all in one place. + - Profiling, detecting and correlating issues and providing recommendations that drive rapid and focused troubleshooting + - A centralized data quality model management system including rule, metadata, scheduler etc. + - Native code generation to run everywhere, including Hadoop, Kafka, Spark, etc. + - One set of tools to build data quality pipelines across all eBay data platforms. http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/a8ba6ba9/griffin-doc/models.md ---------------------------------------------------------------------- diff --git a/griffin-doc/models.md b/griffin-doc/models.md new file mode 100644 index 0000000..a59f32e --- /dev/null +++ b/griffin-doc/models.md @@ -0,0 +1,155 @@ +# Models +models to calculate data quality metrics. + +### Accuracy model +accuracy model is to compare source and target content, given corresponding mapping relationship. + +#### Introduction +How to measure accuracy dimension of one target dataset T, given source of truth as golden dataset S. +To measure accuracy quality of target dataset T, +basic approach is to calculate discrepancy between target and source datasets by going through their contents, +examining whether all fields are exactly matched as below, +``` + Count(source.field1 == target.field1 && source.field2 == target.field2 && ...source.fieldN == target.fieldN) +Accuracy = --------------------------------------------------------------------------------------------------------------- + Count(source) + +``` + +Since two datasets are too big to fit in one box, so our approach is to leverage map reduce programming model by distributed computing. + +The real challenge is how to make this comparing algorithm generic enough to release data analysts and data scientists from coding burdens, and at the same time, it keeps flexibility to cover most of accuracy requirements. + +Traditional way is to use SQL based join to calculate this, like scripts in hive. + +But this SQL based solution can be improved since it has not considered unique natures of source dataset and target dataset in this context. + +Our approach is to provide a generic accuracy model, after taking into consideration of special natures of source dataset and target dataset. + +Our implementation is in scala, leveraging scala's declarative capability to cater for various requirements, and running in spark cluster. + +To make it concrete, schema for Source is as below + +``` +|-- uid: string (nullable = true) +|-- site_id: string (nullable = true) +|-- page_id: string (nullable = true) +|-- curprice: string (nullable = true) +|-- itm: string (nullable = true) +|-- itmcond: string (nullable = true) +|-- itmtitle: string (nullable = true) +|-- l1: string (nullable = true) +|-- l2: string (nullable = true) +|-- leaf: string (nullable = true) +|-- meta: string (nullable = true) +|-- st: string (nullable = true) +|-- dc: string (nullable = true) +|-- tr: string (nullable = true) +|-- eventtimestamp: string (nullable = true) +|-- cln: string (nullable = true) +|-- siid: string (nullable = true) +|-- ciid: string (nullable = true) +|-- sellerid: string (nullable = true) +|-- pri: string (nullable = true) +|-- pt: string (nullable = true) +|-- dt: string (nullable = true) +|-- hour: string (nullable = true) +``` + +and schema for target is below as + +``` +|-- uid: string (nullable = true) +|-- page_id: string (nullable = true) +|-- site_id: string (nullable = true) +|-- js_ev_mak: string (nullable = true) +|-- js_ev_orgn: string (nullable = true) +|-- curprice: string (nullable = true) +|-- itm: string (nullable = true) +|-- itmcond: string (nullable = true) +|-- itmtitle: string (nullable = true) +|-- l1: string (nullable = true) +|-- l2: string (nullable = true) +|-- leaf: string (nullable = true) +|-- meta: string (nullable = true) +|-- st: string (nullable = true) +|-- dc: string (nullable = true) +|-- tr: string (nullable = true) +|-- eventtimestamp: string (nullable = true) +|-- cln: string (nullable = true) +|-- siid: string (nullable = true) +|-- ciid: string (nullable = true) +|-- sellerid: string (nullable = true) +|-- product_ref_id: string (nullable = true) +|-- product_type: string (nullable = true) +|-- is_bu: string (nullable = true) +|-- is_udid: string (nullable = true) +|-- is_userid: string (nullable = true) +|-- is_cguid: string (nullable = true) +|-- dt: string (nullable = true) +|-- hour: string (nullable = true) +``` + + +#### Accuracy Model In Deep + +##### Pre-Process phase (transform raw data) +For efficient, we will convert our raw record to some key-value pair , after that, we just need to compare values which have the same key. +Since two dataset might have different names for the same field, and fields might come in different order, we will keep original information in associative map for later process. + +The records will look like, +``` +((uid,eventtimestamp)->(curprice->value(curprice),itm->value(itm),itmcond->value(itmcond),itmtitle->value(itmtitle),...) +``` +and to track where are the data from, we add one labeling tag here. +for source dataset, we add label tag "\_\_source\_\_" and for target dataset, we add label tag "\_\_target\_\_". +``` +((uid,eventtimestamp)->("__source__",(curprice->value(curprice),itm->value(itm),itmcond->value(itmcond),itmtitle->value(itmtitle),...))) +((uid,eventtimestamp)->("__target__",(curprice->value(curprice),itm->value(itm),itmcond->value(itmcond),itmtitle->value(itmtitle),...))) +``` +Ideally, in dataset, applying those composite keys, we should be able to get unique records for every composite key. +but the reality is , for various unknown reasons, dataset might have duplicate records given one unique composite key. +To cover this problem, and to track all records from source node, we will append all duplicate records in a list during this step. +The record will look like after pre process , +``` +((uid,eventtimestamp)->List(("__source__",(curprice->value(curprice),itm->value(itm),itmcond->value(itmcond),itmtitle->value(itmtitle),...)),...,("__source__",(curprice->value(curprice),itm->value(itm),itmcond->value(itmcond),itmtitle->value(itmtitle),...)))) +``` +To save all records from target node, we will insert all records in a set during this step. +The record will look like after pre process , +``` +((uid,eventtimestamp)->Set(("__target__",(curprice->value(curprice),itm->value(itm),itmcond->value(itmcond),itmtitle->value(itmtitle),...)),...,("__target__",(curprice->value(curprice),itm->value(itm),itmcond->value(itmcond),itmtitle->value(itmtitle),...)))) +``` +##### Aggregate and Comparing phase +Union source and target together, execute one aggregate for all, we can apply rules defined by users to check whether records in source and target are matched or not. + +``` +aggregate { (List(sources),Set(targets)) => + if(foreach element from List(sources) in Set(targets)) emit true + else emit false +} +``` +We can also execute one aggregate to count the mismatch records in source +``` +aggregate (missedCount = 0) { (List(sources), Set(targets)) => + foreach (element in List(sources)) { + if (element in Set(targets)) continue + else missedCount += 1 + } +} +``` +#### Benefits + + It is two times faster than traditional SQL JOIN based solution, since it is using algorithm customized for this special accuracy problem. + + It is easily to iterate new accuracy metric as it is packaged as a common library as a basic service, previously it took us one week to develop and deploy one new metrics from scratch, but after applying this approach , it only need several hours to get all done. + + + + +#### Further discussion + + How to select keys? + How many keys we should use, if we use too many keys, it will reduce our calculation performance, otherwise, it might have too many duplicate records, which will make our comparison logic complex. + + How to define content equation? + For some data, it is straightforward, but for some data, it might require transform by some UDFS, how can we make our system extensible to support different raw data. + + How to fix data latency issue? + To compare, we have to have data available, but how to handle data latency issue which happens often in real enterprise environment. + + How to restore lost data? + Detect data lost is good, but the further action is how can we restore those lost data?