[
https://issues.apache.org/jira/browse/GOBBLIN-1806?focusedWorklogId=855580&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855580
]
ASF GitHub Bot logged work on GOBBLIN-1806:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 07/Apr/23 21:53
Start Date: 07/Apr/23 21:53
Worklog Time Spent: 10m
Work Description: Will-Lo commented on PR #3667:
URL: https://github.com/apache/gobblin/pull/3667#issuecomment-1500676439
Responding to the top level comment: `would we wish to pre-aggregate that
within the event itself`, I think aggregation is needed because there can be
thousands of tasks for large pipelines, which makes serializing all the states
into an event lead to large Kafka events which we want to avoid. Also, I think
most clients wouldn't care too much about the inner details of every individual
task/mapper, it's mainly the concern of the Gobblin framework to deal with
correctly.
Issue Time Tracking
-------------------
Worklog Id: (was: 855580)
Time Spent: 1h 50m (was: 1h 40m)
> Create a GTE for recording bytes/records written for each dataset in a
> Gobblin job
> ----------------------------------------------------------------------------------
>
> Key: GOBBLIN-1806
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1806
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-core
> Reporter: William Lo
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> Gobblin collects a lot of writer metrics on number of bytes and records
> written to the sinks, but does not emit these metrics as part of a
> GobblinTrackingEvent.
> We want to emit these in a GobblinTrackingEvent so that it can be ingested by
> montioring systems and GaaS.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)