> On May 23, 2016, 6:31 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala, > > line 45 > > <https://reviews.apache.org/r/47620/diff/2/?file=1388611#file1388611line45> > > > > This may result in some "alerts" from existing user dashboard. It would > > be better to state this in release notes.
Yeah, it was a bold move and you called me on it. I'll drop this change and the other one. > On May 23, 2016, 6:31 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala, > > line 209 > > <https://reviews.apache.org/r/47620/diff/2/?file=1388612#file1388612line209> > > > > Did you test it w/ ACG job? One perf issue we had before is that the > > addition of metrics code actually degrade the perf significantly. It might > > be good to include Chris' dummy consumer code to test the perf difference > > w/ and w/o the additional timer metrics. As I recall, the perf degradation last time was because of the switch from milliseconds to nanoseconds and some unexpected behavior in our bucketing code. These new metrics are 1 per container and wouldn't be updated any more frequently than choose-ns. So, I feel pretty good about it. In any case, I tested it with a job that has 512 tasks and 64 containers. ACG would have more inputs and probably higher throughput. So I can do that one too. > On May 23, 2016, 6:31 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala, > > line 232 > > <https://reviews.apache.org/r/47620/diff/2/?file=1388612#file1388612line232> > > > > Why are we updating pollNs even w/o polling? Can we move this line > > within the if {} block? Because we want to count 0ns for the times when we don't poll so that the metric "Snapshot" accurately represents the proportion of time spent polling relative to process-ns, etc. - Jake ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47620/#review134411 ----------------------------------------------------------- On May 23, 2016, 9:03 p.m., Jake Maes wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/47620/ > ----------------------------------------------------------- > > (Updated May 23, 2016, 9:03 p.m.) > > > Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina > Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure). > > > Bugs: SAMZA-951 > https://issues.apache.org/jira/browse/SAMZA-951 > > > Repository: samza > > > Description > ------- > > SAMZA-951 - Improve event loop timing metrics > * Exclude choose-ns from the active time, thereby removing the associated > poll() waiting from the event-loop-utilization metric > * Add poll-ns and deserialization-ns metrics to SystemConsumers to help users > understand the time spent during choose. To do this, I needed to extend > TimerUtils, which required a nano clock and an alternate constructor. > * Fix some small issues with naming conventions, etc > > > Diffs > ----- > > samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala > 6916c5c71e479d43a7435fa4987565d93ed437ac > samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala > 32fc771ed31fc84516472fb027902506574f53ab > > samza-core/src/main/scala/org/apache/samza/system/SystemConsumersMetrics.scala > e7f012f5b1a15254078aa85310082c91ac8b56e0 > samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala > ad3744711cc5ef17b5d3a45eb7b567956d77453a > samza-core/src/test/scala/org/apache/samza/system/TestSystemConsumers.scala > fbaa8eeb98857640a36a0c4f62c7b276b59390aa > > Diff: https://reviews.apache.org/r/47620/diff/ > > > Testing > ------- > > ./gradlew build > bin/check-all.sh > Deployed in a test job. > > See the before/after screenshots. Note that they're zoomed in on time and > they don't have the same y-axis value range. Both screenshots were with the > same job which has way more containers than it needs, and is mostly idle. > > > File Attachments > ---------------- > > UtilizationBefore > > https://reviews.apache.org/media/uploaded/files/2016/05/19/55101a6d-8c3d-430a-b415-78108d92d4bc__Utilization_Before.png > UtilizationAfter > > https://reviews.apache.org/media/uploaded/files/2016/05/19/de13bf3d-63e2-4ac6-82ce-60bc5cb6a973__Utilization_After.png > > > Thanks, > > Jake Maes > >
