[
https://issues.apache.org/jira/browse/HADOOP-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411101#comment-13411101
]
Aaron T. Myers commented on HADOOP-8541:
----------------------------------------
bq. For 11, since memory usage seemed to be a major concern, I wanted to avoid
the object overhead from using Longs instead of the primitive type. This is
O(KBs) though, so I can change it if you think it's not readable. The usage of
the array is pretty simple though, there isn't any weird iteration, and it's
only flushed completely.
Makes sense, but you might want to add a comment to that effect above the
instance vars explaining the reasoning.
bq. For 13, I'd have to think about how to make this work. It's probably doable
(basically need to merge adjacent items instead of inserting), but I don't
think it'll yield that big of a performance boost. Again, I'll work on this if
you think it's worthwhile.
I don't think it'll be a performance boost, but the code might be a little
clearer. I personally found it a little odd when reviewing it that we would add
a bunch of items to the list only to potentially merge them moments later. I
don't feel strongly about this, though. We can leave it as you have it if you
think the current version is clear enough.
Some other minor stuff:
# You should use GenericTestUtils#assertExceptionContains to ensure that the
IOE you actually expect gets thrown, instead of some other unrelated IOE:
{code}
+ try {
+ estimator.snapshot();
+ fail("Expected IOException from empty window");
+ } catch (IOException e) {
+ }
{code}
# I think you should add a comment explaining the need for this sleep in the
tests and why it's calculated the way it is. You should also perhaps use
System#currentTimeMillis since I don't think you need nano precision:
{code}
+ long sleep = (start + (5000 * i) + 100) - (System.nanoTime() / 1000000);
+
+ Thread.sleep(sleep);
{code}
Other than the above the patch looks great. The findbugs warning is
HADOOP-8585. +1 once the above issues are addressed, either by fixing them or
saying that they're not worth fixing.
> Better high-percentile latency metrics
> --------------------------------------
>
> Key: HADOOP-8541
> URL: https://issues.apache.org/jira/browse/HADOOP-8541
> Project: Hadoop Common
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 2.0.0-alpha
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Attachments: hadoop-8541-1.patch, hadoop-8541-2.patch,
> hadoop-8541-3.patch, hadoop-8541-4.patch, hadoop-8541-5.patch
>
>
> Based on discussion in HBASE-6261 and with some HDFS devs, I'd like to make
> better high-percentile latency metrics a part of hadoop-common.
> I've already got a working implementation of [1], an efficient algorithm for
> estimating quantiles on a stream of values. It allows you to specify
> arbitrary quantiles to track (e.g. 50th, 75th, 90th, 95th, 99th), along with
> tight error bounds. This estimator can be snapshotted and reset periodically
> to get a feel for how these percentiles are changing over time.
> I propose creating a new MutableQuantiles class that does this. [1] isn't
> completely without overhead (~1MB memory for reasonably sized windows), which
> is why I hesitate to add it to the existing MutableStat class.
> [1] Cormode, Korn, Muthukrishnan, and Srivastava. "Effective Computation of
> Biased Quantiles over Data Streams" in ICDE 2005.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira