This is an automated email from the ASF dual-hosted git repository.
guoyp pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/griffin.git.
from 076c8d0 [GRIFFIN-345] Support cross-version compilation for Scala and
Spark dependencies
new 6745b98 [GRIFFIN-358] Rewrite dataset preprocessing as SQL Queries
new b66f796 [GRIFFIN-358] Rewrite new measure hierarchy and new
completeness measure
new 4373754 [GRIFFIN-358] New Profiling Measure
new 3f565f0 [GRIFFIN-358] New SparkSQL Measure
new 063e16c [GRIFFIN-358] New Duplication (Distinctness, Uniqueness)
Measure
new 095864e [GRIFFIN-358] Changes to Metric Flush process
new e66a5cd [GRIFFIN-358] New Accuracy Measure
new 42c3c17 [GRIFFIN-358] Merge Measure constants
new 2c1cbea [GRIFFIN-358] Added CompletenessMeasureTest
new 1cbf6df [GRIFFIN-358] Added AccuracyMeasureTest
new 8c13b30 [GRIFFIN-358] Added SparkSqlMeasureTest
new 8cd2190 [GRIFFIN-358] Added DuplicationMeasureTest
new 889ca5d [GRIFFIN-358] Added ProfilingMeasureTest
new 64a00d1 [GRIFFIN-358] Fixed formatting
new 739d3f5 [GRIFFIN-358] Fixed breaking test cases
new 25ed0d1 [GRIFFIN-358] Added general documentation for new dimensions/
measures and completeness measure configuration guide.
new aaa1bf3 [GRIFFIN-358] Update Duplication Measure to exclude null
values
new 15bcfa4 [GRIFFIN-358] Added measure configuration guide for
duplication and sparkSql measures.
new 89e38a8 [GRIFFIN-358] Added profiling measure configuration guide.
new b1f1de1 [GRIFFIN-358] Changed 'target' to 'ref' to clear terminology
new 86555b5 [GRIFFIN-358] Added accuracy measure configuration guide.
new c676998 [GRIFFIN-358] Allow users to run old "evaluate.rule" configs
as well
new 3acd96b [GRIFFIN-358] Updated Configurations for pre proc and batch
all measures
new 74f0d68 [GRIFFIN-358] Added test cases for Data pre proc
new e6a3f6b [GRIFFIN-358] Changes structure of Measure
new cb25879 [GRIFFIN-358] Added parallelization to MeasureExecutor
new 7943855 [GRIFFIN-358] Added code documentation for all new measures.
new 3586a30 [GRIFFIN-358] Fixed breaking test case
new 0a08b85 [GRIFFIN-358] Added sampling option to ProfilingMeasure
new c2a173f [GRIFFIN-358] Error handling and code formatting changes
new e3803d2 [GRIFFIN-358] Added SchemaConformance measure
new e1a6c03 [GRIFFIN-358] Changed Metric output format and fixed test
cases
new 444c956 [GRIFFIN-358] Added documentation for SchemaConformanceMeasure
new 4908291 [GRIFFIN-358] Fix import
new 7a50813 Merge pull request #591 from chitralverma/fix-measures
The 631 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
griffin-doc/measure/dimensions.md | 135 +
.../measure-configuration-guide/accuracy.md | 245 ++
.../measure-configuration-guide/completeness.md | 212 +
.../measure-configuration-guide/duplication.md | 257 ++
.../measure-configuration-guide/profiling.md | 210 +
.../schema_conformance.md | 218 +
.../measure-configuration-guide/sparksql.md | 227 +
measure/sbin/griffin-tool.sh | 8 +-
.../main/resources/config-batch-all-measures.json | 164 +
.../{config-batch.json => config-batch-old.json} | 0
.../src/main/resources/config-batch-preproc.json | 48 +
measure/src/main/resources/crime_report.csv | 4617 ++++++++++++++++++++
measure/src/main/resources/crime_report_truth.csv | 2 +
measure/src/main/resources/env-batch.json | 4 +-
measure/src/main/resources/log4j.properties | 2 -
.../org/apache/griffin/measure/Application.scala | 2 +
.../org/apache/griffin/measure/Loggable.scala | 3 +-
.../configuration/dqdefinition/DQConfig.scala | 110 +-
.../measure/configuration/enums/DslType.scala | 8 +-
.../{ProcessType.scala => MeasureTypes.scala} | 19 +-
.../apache/griffin/measure/context/DQContext.scala | 20 +-
.../griffin/measure/context/MetricWrapper.scala | 43 +-
.../checkpoint/offset/OffsetCheckpointInZK.scala | 5 +-
.../measure/datasource/DataSourceFactory.scala | 10 +-
.../datasource/connector/DataConnector.scala | 52 +-
.../batch/ElasticSearchGriffinDataConnector.scala | 12 +-
.../apache/griffin/measure/execution/Measure.scala | 170 +
.../measure/execution/MeasureExecutor.scala | 266 ++
.../measure/execution/impl/AccuracyMeasure.scala | 249 ++
.../execution/impl/CompletenessMeasure.scala | 112 +
.../execution/impl/DuplicationMeasure.scala | 180 +
.../measure/execution/impl/ProfilingMeasure.scala | 263 ++
.../execution/impl/SchemaConformanceMeasure.scala | 154 +
.../measure/execution/impl/SparkSQLMeasure.scala | 124 +
.../griffin/measure/launch/batch/BatchDQApp.scala | 59 +-
.../measure/launch/streaming/StreamingDQApp.scala | 1 +
.../apache/griffin/measure/sink/ConsoleSink.scala | 13 +-
.../griffin/measure/sink/ElasticSearchSink.scala | 1 -
.../measure/step/builder/DQStepBuilder.scala | 1 -
.../step/builder/DataFrameOpsDQStepBuilder.scala | 40 -
.../step/builder/GriffinDslDQStepBuilder.scala | 2 +-
.../step/builder/dsl/transform/Expr2DQSteps.scala | 1 -
.../step/builder/preproc/PreProcParamMaker.scala | 69 -
.../griffin/measure/step/read/ReadStep.scala | 3 +-
.../measure/step/transform/TransformStep.scala | 7 +-
.../step/write/DataSourceUpdateWriteStep.scala | 3 +-
.../measure/step/write/MetricFlushStep.scala | 8 +-
.../measure/step/write/MetricWriteStep.scala | 21 +-
.../apache/griffin/measure/utils/JsonUtil.scala | 6 +-
.../apache/griffin/measure/utils/ParamUtil.scala | 1 -
.../test/resources/_accuracy-batch-griffindsl.json | 82 +-
.../resources/_completeness-batch-griffindsl.json | 45 +-
.../resources/_distinctness-batch-griffindsl.json | 62 +-
.../resources/_no_measure_or_rules_malformed.json | 24 +
.../resources/_profiling-batch-griffindsl.json | 65 +-
.../_profiling-batch-griffindsl_malformed.json | 12 +-
.../test/resources/_sparksql-batch-griffindsl.json | 59 +
measure/src/test/resources/crime_report_test.csv | 10 +
.../test/resources/duplicates_users_info_src.csv | 5 +
measure/src/test/resources/env-batch.json | 7 +
.../src/test/resources/env-streaming-mongo.json | 2 -
measure/src/test/resources/env-streaming.json | 2 -
.../invalidtype_completeness_batch_griffindal.json | 9 +-
.../missingrule_accuracy_batch_sparksql.json | 5 +-
measure/src/test/resources/log4j.properties | 11 +-
measure/src/test/resources/users_info_src.csv | 50 +
.../apache/griffin/measure/SparkSuiteBase.scala | 6 +-
.../dqdefinition/reader/ParamEnumReaderSpec.scala | 39 +-
.../dqdefinition/reader/ParamFileReaderSpec.scala | 33 +-
.../dqdefinition/reader/ParamJsonReaderSpec.scala | 11 +-
.../measure/context/MetricWrapperTest.scala | 16 +-
.../connector/DataConnectorPreProcTest.scala | 200 +
.../execution/impl/AccuracyMeasureTest.scala | 158 +
.../execution/impl/CompletenessMeasureTest.scala | 119 +
.../execution/impl/DuplicationMeasureTest.scala | 156 +
.../measure/execution/impl/MeasureTest.scala | 76 +
.../execution/impl/ProfilingMeasureTest.scala | 104 +
.../impl/SchemaConformanceMeasureTest.scala | 124 +
.../execution/impl/SparkSqlMeasureTest.scala | 137 +
.../griffin/measure/job/BatchDQAppTest.scala | 161 +-
.../org/apache/griffin/measure/job/DQAppTest.scala | 13 +-
.../apache/griffin/measure/sink/CustomSink.scala | 57 +-
.../griffin/measure/sink/CustomSinkTest.scala | 81 +-
.../AccuracyTransformationsIntegrationTest.scala | 1 +
84 files changed, 9723 insertions(+), 606 deletions(-)
create mode 100644 griffin-doc/measure/dimensions.md
create mode 100644 griffin-doc/measure/measure-configuration-guide/accuracy.md
create mode 100644
griffin-doc/measure/measure-configuration-guide/completeness.md
create mode 100644
griffin-doc/measure/measure-configuration-guide/duplication.md
create mode 100644 griffin-doc/measure/measure-configuration-guide/profiling.md
create mode 100644
griffin-doc/measure/measure-configuration-guide/schema_conformance.md
create mode 100644 griffin-doc/measure/measure-configuration-guide/sparksql.md
create mode 100644 measure/src/main/resources/config-batch-all-measures.json
rename measure/src/main/resources/{config-batch.json => config-batch-old.json}
(100%)
create mode 100644 measure/src/main/resources/config-batch-preproc.json
create mode 100644 measure/src/main/resources/crime_report.csv
create mode 100644 measure/src/main/resources/crime_report_truth.csv
copy
measure/src/main/scala/org/apache/griffin/measure/configuration/enums/{ProcessType.scala
=> MeasureTypes.scala} (69%)
create mode 100644
measure/src/main/scala/org/apache/griffin/measure/execution/Measure.scala
create mode 100644
measure/src/main/scala/org/apache/griffin/measure/execution/MeasureExecutor.scala
create mode 100644
measure/src/main/scala/org/apache/griffin/measure/execution/impl/AccuracyMeasure.scala
create mode 100644
measure/src/main/scala/org/apache/griffin/measure/execution/impl/CompletenessMeasure.scala
create mode 100644
measure/src/main/scala/org/apache/griffin/measure/execution/impl/DuplicationMeasure.scala
create mode 100644
measure/src/main/scala/org/apache/griffin/measure/execution/impl/ProfilingMeasure.scala
create mode 100644
measure/src/main/scala/org/apache/griffin/measure/execution/impl/SchemaConformanceMeasure.scala
create mode 100644
measure/src/main/scala/org/apache/griffin/measure/execution/impl/SparkSQLMeasure.scala
delete mode 100644
measure/src/main/scala/org/apache/griffin/measure/step/builder/DataFrameOpsDQStepBuilder.scala
delete mode 100644
measure/src/main/scala/org/apache/griffin/measure/step/builder/preproc/PreProcParamMaker.scala
create mode 100644
measure/src/test/resources/_no_measure_or_rules_malformed.json
create mode 100644 measure/src/test/resources/_sparksql-batch-griffindsl.json
create mode 100644 measure/src/test/resources/crime_report_test.csv
create mode 100644 measure/src/test/resources/duplicates_users_info_src.csv
create mode 100644 measure/src/test/resources/users_info_src.csv
create mode 100644
measure/src/test/scala/org/apache/griffin/measure/datasource/connector/DataConnectorPreProcTest.scala
create mode 100644
measure/src/test/scala/org/apache/griffin/measure/execution/impl/AccuracyMeasureTest.scala
create mode 100644
measure/src/test/scala/org/apache/griffin/measure/execution/impl/CompletenessMeasureTest.scala
create mode 100644
measure/src/test/scala/org/apache/griffin/measure/execution/impl/DuplicationMeasureTest.scala
create mode 100644
measure/src/test/scala/org/apache/griffin/measure/execution/impl/MeasureTest.scala
create mode 100644
measure/src/test/scala/org/apache/griffin/measure/execution/impl/ProfilingMeasureTest.scala
create mode 100644
measure/src/test/scala/org/apache/griffin/measure/execution/impl/SchemaConformanceMeasureTest.scala
create mode 100644
measure/src/test/scala/org/apache/griffin/measure/execution/impl/SparkSqlMeasureTest.scala