[
https://issues.apache.org/jira/browse/ACCUMULO-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Keith Turner resolved ACCUMULO-4730.
------------------------------------
Resolution: Fixed
> Create an Entry length summarizer
> ---------------------------------
>
> Key: ACCUMULO-4730
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4730
> Project: Accumulo
> Issue Type: Improvement
> Reporter: Keith Turner
> Assignee: Jared R
> Labels: newbie, pull-request-available
> Fix For: 2.0.0
>
> Time Spent: 3h
> Remaining Estimate: 0h
>
> It would be very useful to have a built in
> [Summarizer|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/summary/Summarizer.java]
> that computes summary information about field lengths. Specifically key
> length, row length, family length, qualifier length, visibility length, and
> value length. Whatever stats are computed must be able to computed
> incrementally. For example can incrementally compute min, max, count, sum,
> and log2 histogram. I think these would be good stats to start with. Count
> and sum can be used to compute the average. There is an example of computing
> a log2 histogram in the Summarizer javadoc.
> The Summarizer could be named EntryLenghtSummarizer and possibly produce
> summaries like the following.
> {noformat}
> count=XXX //do not need to track this per field, its the same for all
> key.min=XXX
> key.max=XXX
> key.sum=XXX
> key.logHist.8=XXX //only output non zero exponents
> key.logHist.9=XXX
> row.min=XXX
> row.max=XXX
> row.sum=XXX
> row.logHist.7=XXX
> row.logHist.8=XXX
> row.logHist.10=XXX
> family.min=XXX
> family.max=XXX
> family.sum=XXX
> family.logHist.6=XXX
> family.logHist.7=XXX
> etc...
> {noformat}
> This new summarizer would be placed in the
> [summarizers|https://github.com/apache/accumulo/tree/master/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers]
> package.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)