> On May 23, 2016, 6:31 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala,
> >  line 45
> > <https://reviews.apache.org/r/47620/diff/2/?file=1388611#file1388611line45>
> >
> >     This may result in some "alerts" from existing user dashboard. It would 
> > be better to state this in release notes.

Yeah, it was a bold move and you called me on it. I'll drop this change and the 
other one.


> On May 23, 2016, 6:31 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala, 
> > line 209
> > <https://reviews.apache.org/r/47620/diff/2/?file=1388612#file1388612line209>
> >
> >     Did you test it w/ ACG job? One perf issue we had before is that the 
> > addition of metrics code actually degrade the perf significantly. It might 
> > be good to include Chris' dummy consumer code to test the perf difference 
> > w/ and w/o the additional timer metrics.

As I recall, the perf degradation last time was because of the switch from 
milliseconds to nanoseconds and some unexpected behavior in our bucketing code. 
These new metrics are 1 per container and wouldn't be updated any more 
frequently than choose-ns. So, I feel pretty good about it.

In any case, I tested it with a job that has 512 tasks and 64 containers. ACG 
would have more inputs and probably higher throughput. So I can do that one too.


> On May 23, 2016, 6:31 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala, 
> > line 232
> > <https://reviews.apache.org/r/47620/diff/2/?file=1388612#file1388612line232>
> >
> >     Why are we updating pollNs even w/o polling? Can we move this line 
> > within the if {} block?

Because we want to count 0ns for the times when we don't poll so that the 
metric "Snapshot" accurately represents the proportion of time spent polling 
relative to process-ns, etc.


- Jake


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47620/#review134411
-----------------------------------------------------------


On May 23, 2016, 9:03 p.m., Jake Maes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47620/
> -----------------------------------------------------------
> 
> (Updated May 23, 2016, 9:03 p.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina 
> Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-951
>     https://issues.apache.org/jira/browse/SAMZA-951
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> SAMZA-951 - Improve event loop timing metrics
> * Exclude choose-ns from the active time, thereby removing the associated 
> poll() waiting from the event-loop-utilization metric 
> * Add poll-ns and deserialization-ns metrics to SystemConsumers to help users 
> understand the time spent during choose. To do this, I needed to extend 
> TimerUtils, which required a nano clock and an alternate constructor. 
> * Fix some small issues with naming conventions, etc
> 
> 
> Diffs
> -----
> 
>   samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala 
> 6916c5c71e479d43a7435fa4987565d93ed437ac 
>   samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala 
> 32fc771ed31fc84516472fb027902506574f53ab 
>   
> samza-core/src/main/scala/org/apache/samza/system/SystemConsumersMetrics.scala
>  e7f012f5b1a15254078aa85310082c91ac8b56e0 
>   samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 
> ad3744711cc5ef17b5d3a45eb7b567956d77453a 
>   samza-core/src/test/scala/org/apache/samza/system/TestSystemConsumers.scala 
> fbaa8eeb98857640a36a0c4f62c7b276b59390aa 
> 
> Diff: https://reviews.apache.org/r/47620/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> bin/check-all.sh
> Deployed in a test job. 
> 
> See the before/after screenshots. Note that they're zoomed in on time and 
> they don't have the same y-axis value range. Both screenshots were with the 
> same job which has way more containers than it needs, and is mostly idle.
> 
> 
> File Attachments
> ----------------
> 
> UtilizationBefore
>   
> https://reviews.apache.org/media/uploaded/files/2016/05/19/55101a6d-8c3d-430a-b415-78108d92d4bc__Utilization_Before.png
> UtilizationAfter
>   
> https://reviews.apache.org/media/uploaded/files/2016/05/19/de13bf3d-63e2-4ac6-82ce-60bc5cb6a973__Utilization_After.png
> 
> 
> Thanks,
> 
> Jake Maes
> 
>

Reply via email to