[ https://issues.apache.org/jira/browse/DATAFU-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883575#comment-13883575 ]
Matthew Hayes commented on DATAFU-16: ------------------------------------- Thanks for running the experiment Jian! I expected there might be an issue with the "weighted reservoir sampling exponential jump algebraic" case. I think that the exponential jump method only works on an accumulate-based model. For algebraic, the usage of a combiner probably breaks the assumptions behind this approach. > weighted reservoir sampling with exponential jumps UDF > ------------------------------------------------------ > > Key: DATAFU-16 > URL: https://issues.apache.org/jira/browse/DATAFU-16 > Project: DataFu > Issue Type: New Feature > Environment: Mac, Linux > pig-0.11 > Reporter: jian wang > Priority: Minor > Attachments: ScoredExpJmpReservoir.java, ScoredReservoir.java, > WeightedSamplingCorrectnessTests.java > > > Create a weightedReservoirSampleWithExpJump UDF to implement the weighted > reservoir sampling algorithm with exponential jumps. Investigation is tracked > in https://github.com/linkedin/datafu/issues/80. This task is part of > experiment of different weighted sampling algorithms. -- This message was sent by Atlassian JIRA (v6.1.5#6160)