[ 
https://issues.apache.org/jira/browse/GOBBLIN-1806?focusedWorklogId=856174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-856174
 ]

ASF GitHub Bot logged work on GOBBLIN-1806:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Apr/23 16:41
            Start Date: 11/Apr/23 16:41
    Worklog Time Spent: 10m 
      Work Description: phet commented on code in PR #3667:
URL: https://github.com/apache/gobblin/pull/3667#discussion_r1163069107


##########
gobblin-metrics-libs/gobblin-metrics-base/src/main/avro/GaaSObservabilityEventExperimental.avsc:
##########
@@ -208,12 +208,17 @@
               {
                 "name": "bytesWritten",
                 "type": "long",
-                "doc": "Number of bytes written for the dataset"
+                "doc": "Number of bytes written for the dataset, can be -1 if 
unsupported by the writer (e.g. jdbc writer)"
               },
               {
-                "name": "recordsWritten",
+                "name": "entitiesWritten",
                 "type": "long",
-                "doc": "Number of records written for the dataset"
+                "doc": "Number of entities written for the dataset by the 
Gobblin writer"

Review Comment:
   "(e.g. files or records)"



##########
gobblin-metrics-libs/gobblin-metrics-base/src/main/avro/GaaSObservabilityEventExperimental.avsc:
##########
@@ -208,12 +208,17 @@
               {
                 "name": "bytesWritten",
                 "type": "long",
-                "doc": "Number of bytes written for the dataset"
+                "doc": "Number of bytes written for the dataset, can be -1 if 
unsupported by the writer (e.g. jdbc writer)"
               },
               {
-                "name": "recordsWritten",
+                "name": "entitiesWritten",
                 "type": "long",
-                "doc": "Number of records written for the dataset"
+                "doc": "Number of entities written for the dataset by the 
Gobblin writer"
+              },
+              {
+                "name": "datasetCommitSucceeded",

Review Comment:
   nit: do we really want to repeat 'dataset', as in 
`datasetsWritten[*].datasetCommitSucceeded` (e.g. we don't name it 
`datasetBytesWritten`)?  maybe `wasCommitted`(?)



##########
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/DatasetTaskSummary.java:
##########
@@ -17,30 +17,26 @@
 
 package org.apache.gobblin.runtime;
 
+import lombok.Data;
+
+import org.apache.gobblin.metrics.DatasetMetric;
+
+
 /**
  * A class returned by {@link org.apache.gobblin.runtime.SafeDatasetCommit} to 
provide metrics for the dataset
  * that can be reported as a single event in the commit phase.
  */
+@Data
 public class DatasetTaskSummary {
   private final String datasetUrn;
   private final long recordsWritten;
   private final long bytesWritten;
+  private final boolean datasetCommitSucceeded;
 
-  public DatasetTaskSummary(String datasetUrn, long recordsWritten, long 
bytesWritten) {
-    this.datasetUrn = datasetUrn;
-    this.recordsWritten = recordsWritten;
-    this.bytesWritten = bytesWritten;
-  }
-
-  public String getDatasetUrn() {
-    return datasetUrn;
-  }
-
-  public long getRecordsWritten() {
-    return recordsWritten;
-  }
-
-  public long getBytesWritten() {
-    return bytesWritten;
+  /**
+   * Convert a {@link DatasetTaskSummary} to a {@link DatasetMetric}.
+   */
+  public static DatasetMetric toDatasetMetric(DatasetTaskSummary 
datasetTaskSummary) {

Review Comment:
   NBD, but why not an instance method?  (invocation syntax, inside `.map()` 
would remain unchanged)





Issue Time Tracking
-------------------

    Worklog Id:     (was: 856174)
    Time Spent: 2h  (was: 1h 50m)

> Create a GTE for recording bytes/records written for each dataset in a 
> Gobblin job
> ----------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1806
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1806
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-core
>            Reporter: William Lo
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Gobblin collects a lot of writer metrics on number of bytes and records 
> written to the sinks, but does not emit these metrics as part of a 
> GobblinTrackingEvent.
> We want to emit these in a GobblinTrackingEvent so that it can be ingested by 
> montioring systems and GaaS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to