[ 
https://issues.apache.org/jira/browse/ACCUMULO-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Turner resolved ACCUMULO-4730.
------------------------------------
    Resolution: Fixed

> Create an Entry length summarizer
> ---------------------------------
>
>                 Key: ACCUMULO-4730
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4730
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>            Assignee: Jared R
>              Labels: newbie, pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> It would be very useful to have a built in 
> [Summarizer|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/summary/Summarizer.java]
>  that computes summary information about field lengths.  Specifically key 
> length, row length, family length, qualifier length, visibility length, and 
> value length.   Whatever stats are computed must be able to computed 
> incrementally.  For example can incrementally compute min, max, count, sum, 
> and log2 histogram.  I think these would be good stats to start with.  Count 
> and sum can be used to compute the average.  There is an example of computing 
> a log2 histogram in the Summarizer javadoc.
> The Summarizer could be named EntryLenghtSummarizer and possibly produce 
> summaries like the following.  
> {noformat}
> count=XXX     //do not need to track this per field, its the same for all
> key.min=XXX
> key.max=XXX
> key.sum=XXX
> key.logHist.8=XXX   //only output non zero exponents 
> key.logHist.9=XXX
> row.min=XXX
> row.max=XXX
> row.sum=XXX
> row.logHist.7=XXX
> row.logHist.8=XXX
> row.logHist.10=XXX
> family.min=XXX
> family.max=XXX
> family.sum=XXX
> family.logHist.6=XXX
> family.logHist.7=XXX
> etc...
> {noformat}
> This new summarizer would be placed in the 
> [summarizers|https://github.com/apache/accumulo/tree/master/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers]
>  package.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to