[
https://issues.apache.org/jira/browse/GOBBLIN-1806?focusedWorklogId=855551&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855551
]
ASF GitHub Bot logged work on GOBBLIN-1806:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 07/Apr/23 18:01
Start Date: 07/Apr/23 18:01
Worklog Time Spent: 10m
Work Description: phet commented on code in PR #3667:
URL: https://github.com/apache/gobblin/pull/3667#discussion_r1160861533
##########
gobblin-metrics-libs/gobblin-metrics-base/src/main/avro/GaaSObservabilityEventExperimental.avsc:
##########
@@ -188,6 +188,38 @@
}
}
]
- }]
+ },
+ {
+ "name": "datasetsWritten",
+ "type": [
+ "null",
+ {
+ "type": "array",
+ "items": {
+ "type": "record",
+ "name": "DatasetMetric",
+ "doc": "DatasetMetric contains bytes and records written by
Gobblin writers for the dataset URN.",
+ "fields": [
+ {
+ "name": "datasetUrn",
+ "type": "string",
+ "doc": "URN of the dataset"
+ },
+ {
+ "name": "bytesWritten",
+ "type": "long",
+ "doc": "Number of bytes written for the dataset"
Review Comment:
which jobs is this applicable to? e.g. could it work for retention-release?
what about for pulling record-by-record from a CRM system and writing a subset
of the records fields to a relational DB?
how to measure the number of bytes "written" to the DB... maybe approximate
from the char count of the stringified SQL statement?
(the questions I'm unclear on are ones to anticipate from users seeking
clarity from these "doc" strings)
Issue Time Tracking
-------------------
Worklog Id: (was: 855551)
Time Spent: 40m (was: 0.5h)
> Create a GTE for recording bytes/records written for each dataset in a
> Gobblin job
> ----------------------------------------------------------------------------------
>
> Key: GOBBLIN-1806
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1806
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-core
> Reporter: William Lo
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Gobblin collects a lot of writer metrics on number of bytes and records
> written to the sinks, but does not emit these metrics as part of a
> GobblinTrackingEvent.
> We want to emit these in a GobblinTrackingEvent so that it can be ingested by
> montioring systems and GaaS.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)