pjain1 opened a new pull request #10407:
URL: https://github.com/apache/druid/pull/10407


   ### Description
   
   Currently there is no way to know how much data is processed by task during 
ingestion. This PR adds `ingest/events/processedBytes` metric to emit number of 
bytes read since last emission time. 
   
    - This PR adds `InputStats` class which is present in all task types and 
acts as holder for task level counts like processed bytes in this case. Thus 
standardized metrics throughout the task types can be added in future and 
emitted using `InputStatsMonitor` which is automatically initialized for all 
tasks
   
   - This PR provides convenient wrapper class named  `CountableInputEntity` 
which can warp any `InputEntity` to count number of bytes processed through 
that `InputEntity`, thus its easier for new implementations to emit this metric 
just by wrapping the base input entity in this while creating 
`InputEntityIteratingReader`
   
   - Since Kafka and Kinesis does not use `InputEntity`, therefore processed 
bytes is increment directly in `SeekableStreamIndexTaskRunner` as it has access 
to `InputStats`
   
   - This does not support Firehoses
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] been tested in a test Druid cluster.
   <hr>
   
   ##### Key changed/added classes in this PR
    * `InputStats`
    * `InputStatsMonitor`
    * `CountableInputEntity`
    * `AbstractBatchIndexTask`
    * `SeekableStreamIndexTask`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to