[
https://issues.apache.org/jira/browse/FLINK-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736042#comment-14736042
]
ASF GitHub Bot commented on FLINK-2533:
---------------------------------------
Github user ChengXiangLi commented on a diff in the pull request:
https://github.com/apache/flink/pull/1110#discussion_r39002862
--- Diff:
flink-java/src/main/java/org/apache/flink/api/java/sampling/PoissonSampler.java
---
@@ -93,29 +98,59 @@ public boolean hasNext() {
}
}
}
-
- private void moveToNextElement() {
- while (input.hasNext()) {
+
+ public int poisson_ge1(double p){
+ // sample 'k' from Poisson(p), conditioned to k
>= 1
+ double q = Math.pow(Math.E, -p);
+ // simulate a poisson trial such that k >= 1
+ double t = q + (1 - q)*random.nextDouble();
+ int k = 1;
+ // continue standard poisson generation trials
+ t = t * random.nextDouble();
+ while (t > q) {
+ k++;
+ t = t * random.nextDouble();
+ }
+ return k;
+ }
+
+ private void moveToNextElement(int num) {
+ // skip elements with replication factor zero
+ int elementCount = 0;
+ while (input.hasNext() && elementCount < num){
currentElement = input.next();
- currentCount =
poissonDistribution.sample();
- if (currentCount > 0) {
- break;
+ elementCount++;
+ }
+ }
+
+ private void samplingProcess(){
+ if (fraction <= THRESHOLD) {
+ double u =
Math.max(random.nextDouble(), EPSILON);
+ int gap = (int) (Math.log(u) /
-fraction);
+ moveToNextElement(gap);
+ if (input.hasNext()) {
+ currentElement = input.next();
+ currentCount =
poisson_ge1(fraction);
+ }
+ }
+ else {
--- End diff --
Format: else should follow the right curly brace in the same line.
> Gap based random sample optimization
> ------------------------------------
>
> Key: FLINK-2533
> URL: https://issues.apache.org/jira/browse/FLINK-2533
> Project: Flink
> Issue Type: Improvement
> Components: Core
> Reporter: Chengxiang Li
> Priority: Minor
>
> For random sampler with fraction, like BernoulliSampler and PoissonSampler,
> Gap based random sampler could exploit O(k) sample implementation instead of
> previous O\(n\) sample implementation, it should perform better while sample
> fraction is very small. [This
> blog|http://erikerlandson.github.io/blog/2014/09/11/faster-random-samples-with-gap-sampling/]
> describes more detail about gap based random sampler.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)