Christopher Tubbs commented on ACCUMULO-4500:

I would like to suggest we avoid thinking about this feature in terms of 
histograms. Histograms are only one possible feature of what was discussed on 
the mailing list thread. Granted, it was the feature which started the 
conversation, but the concepts, APIs, and features, we'd have to implement to 
support histograms, should be much more generic, supporting a wider variety of 
use cases.

The basic functionality which would be needed to support this feature, as 
discussed on the mailing list, could be generalized simply as "named counters", 
and I think for simplicity sake, we should limit the feature to be a mapping of 
names (type: String) to counts (type: signed Long).

Additionally, I was thinking about this today, and I think it would be a good 
idea that when this information is exposed in the client API, it should be 
retrievable through a user-supplied aggregation/combiner function. The 
reasoning for this is that client code doesn't normally deal with things at the 
granularity of files, but rather, the granularity of tablets, ranges, and 
tables. That should probably be true for any new API to retrieve these data as 
well. And, if that's the case, there will need to be some mechanism to 
aggregate the data from multiple files for the requested range/tablet/table. A 
summation function would probably be the most common, but certainly not the 
only useful aggregation function.

> Implement visibility histograms as a table feature
> --------------------------------------------------
>                 Key: ACCUMULO-4500
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4500
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Josh Elser
> Add support to quickly extract a histogram of all of the visibilities stored 
> in an Accumulo table.
> https://lists.apache.org/thread.html/df5e764362a95277344fd2731a432e9fafc60595e7d30015d9a56b9c@%3Cdev.accumulo.apache.org%3E

This message was sent by Atlassian JIRA

Reply via email to