[ https://issues.apache.org/jira/browse/METRON-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289914#comment-16289914 ]
ASF GitHub Bot commented on METRON-1350: ---------------------------------------- GitHub user cestella opened a pull request: https://github.com/apache/metron/pull/867 METRON-1350: Add reservoir sampling functions to Stellar ## Contributor Comments Sampling capabilities would fit very well with the profiler and enable algorithms that do not necessarily support our existing probabilistic sketches. We should add a reservoir sampler and utilities to merge and resample. You can play with `SAMPLE_INIT`, `SAMPLE_ADD`, `SAMPLE_MERGE` and `SAMPLE_GET` via the REPL: ``` [Stellar]>>> ?SAMPLE_ADD SAMPLE_ADD Description: Add to a sample Arguments: sampler - Sampler to use. If null, then a default Uniform sampler is created o - The value to add. If o is an Iterable, then each item is added. Returns: [Stellar]>>> s_10 := SAMPLE_INIT(10) [Stellar]>>> sample := REDUCE( [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ], (s, x) -> SAMPLE_ADD(s, x), SAMPLE_INIT(5)) [Stellar]>>> SAMPLE_GET(sample) [6, 8, 11, 4, 5] [Stellar]>>> SAMPLE_ADD(s_10, [5, 2, 5, 7, 10 ]) org.apache.metron.statistics.sampling.UniformSampler@3d8d06c0 [Stellar]>>> SAMPLE_GET(SAMPLE_ADD(s_10, [5, 2, 5, 7, 10 ])) [5, 2, 5, 7, 10, 5, 2, 5, 7, 10] ``` ## Pull Request Checklist Thank you for submitting a contribution to Apache Metron. Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions. Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides. In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? ### For code changes: - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: ``` mvn -q clean integration-test install && build_utils/verify_licenses.sh ``` - [x] Have you written or updated unit tests and or integration tests to verify your changes? - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? ### For documentation related changes: - [x] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`: ``` cd site-book mvn site ``` #### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cestella/incubator-metron sampling Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/867.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #867 ---- commit 7e1a19e29f86a23140aa46291f0083409ddac40d Author: cstella <ceste...@gmail.com> Date: 2017-12-13T20:59:40Z METRON-1350: Add reservoir sampling functions to Stellar ---- > Add reservoir sampling functions to Stellar > ------------------------------------------- > > Key: METRON-1350 > URL: https://issues.apache.org/jira/browse/METRON-1350 > Project: Metron > Issue Type: Improvement > Reporter: Casey Stella > > Sampling capabilities would fit very well with the profiler and enable > algorithms that do not necessarily support our existing probabilistic > sketches. We should add a reservoir sampler and utilities to merge and > resample. -- This message was sent by Atlassian JIRA (v6.4.14#64029)