Not sure if the image shows in the original email: Re-attaching the same below 
and in the email attachments.

[A close up of a map  Description automatically generated]



From: Preetam Shingavi <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, February 19, 2020 at 10:19 AM
To: "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>
Subject: Use case for completion measure

Hello everyone,

I am trying to think of a way to create a measure that would help me correlate 
the following scenario:

Consider a workflow below that has 4 microservices, SystemA, B, C and D. Each 
system sends transaction to the next as shown with solid arrows and also sends 
monitoring events / a.k.a MEs (includes ids that help correlate downstream). 
Discard system C and D for the rest of description below.

I need to find a way to measure completion score for each system and for the 
whole workflow.

Completion score for SystemA: Is as simple as count of unique combination of 
ids in ITEM_PUBLISHED MEs.

Completion score for SystemB: I need to correlate 1:EVENT_RECEIVED ->1: 
EXP_OUTPUT (expectedCount=2) -> 2 unique: ITEM_PUBLISHED (correlated by set of 
ids). I am trying to see what’s the optimal way to do completion score for this 
system and then use the result of this score to find the overall workflow score 
together. Like how can I stream result of one measure to other to deduce other 
measure (in batch and stream way, both).

[A close up of a map  Description automatically generated]

My approach was to:

  1.  Create measure for System A as profiling metrics: count(unique(id1, 
id2)). In sink send it to another kafka topic (stream) / new hdfs location 
(batch)
  2.  Create custom measure for System B to correlate EVENT_RECEIVED to 
EXP_OUTPUT and find expectedCount value and match that to # of ITEM_PUBLISHED 
MEs. In sink send it to another kafka topic (stream) / new hdfs location (batch)
  3.  Similar for other systems
  4.  Create measure to create score metrics from the new kafka stream / hdfs 
batch location to create workflow score.

Any thoughts, inputs are highly appreciable. Thank you, for going through this.

NOTE: I am working with a small team within Expedia Group, trying to solve 
workflow completion, accuracy etc. DQ problem for a number of workflows. We 
have built a custom application today but see Apache Griffin as a great value 
if we make it work for our use cases. There are many features that we’d like to 
build on top of existing considering all use cases we have today built in our 
custom application and my team will be happy to contribute to this project 
fulltime if we are able to build a working prototype and convince our managers 
for all good reasons 😊  (Cost, scalability, availability, configurable, 
reprocessing, dashboard & reporting etc).

Thanks,
Preetam

Reply via email to