[ 
https://issues.apache.org/jira/browse/GOBBLIN-1806?focusedWorklogId=855551&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855551
 ]

ASF GitHub Bot logged work on GOBBLIN-1806:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Apr/23 18:01
            Start Date: 07/Apr/23 18:01
    Worklog Time Spent: 10m 
      Work Description: phet commented on code in PR #3667:
URL: https://github.com/apache/gobblin/pull/3667#discussion_r1160861533


##########
gobblin-metrics-libs/gobblin-metrics-base/src/main/avro/GaaSObservabilityEventExperimental.avsc:
##########
@@ -188,6 +188,38 @@
           }
         }
       ]
-    }]
+    },
+    {
+      "name": "datasetsWritten",
+      "type": [
+        "null",
+        {
+          "type": "array",
+          "items": {
+            "type": "record",
+            "name": "DatasetMetric",
+            "doc": "DatasetMetric contains bytes and records written by 
Gobblin writers for the dataset URN.",
+            "fields": [
+              {
+                "name": "datasetUrn",
+                "type": "string",
+                "doc": "URN of the dataset"
+              },
+              {
+                "name": "bytesWritten",
+                "type": "long",
+                "doc": "Number of bytes written for the dataset"

Review Comment:
   which jobs is this applicable to?  e.g. could it work for retention-release? 
 what about for pulling record-by-record from a CRM system and writing a subset 
of the records fields to a relational DB?  
   
   how to measure the number of bytes "written" to the DB... maybe approximate 
from the char count of the stringified SQL statement?
   
   (the questions I'm unclear on are ones to anticipate from users seeking 
clarity from these "doc" strings)





Issue Time Tracking
-------------------

    Worklog Id:     (was: 855551)
    Time Spent: 40m  (was: 0.5h)

> Create a GTE for recording bytes/records written for each dataset in a 
> Gobblin job
> ----------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1806
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1806
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-core
>            Reporter: William Lo
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Gobblin collects a lot of writer metrics on number of bytes and records 
> written to the sinks, but does not emit these metrics as part of a 
> GobblinTrackingEvent.
> We want to emit these in a GobblinTrackingEvent so that it can be ingested by 
> montioring systems and GaaS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to