scwhittle commented on code in PR #33780:
URL: https://github.com/apache/beam/pull/33780#discussion_r1939007061
##########
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateBackedIterable.java:
##########
@@ -138,15 +157,19 @@ public boolean hasNext() {
public T next() {
T value = wrappedIterator.next();
try {
- elementCoder.registerByteSizeObserver(value, observerProxy);
- if (observerProxy.getIsLazy()) {
- // The observer will only be notified of bytes as the result
- // is used. We defer advancing the observer until hasNext in an
- // attempt to capture those bytes.
- observerNeedsAdvance = true;
- } else {
- observerNeedsAdvance = false;
- observerProxy.advance();
+ if (sampleElement() ||
elementCoder.isRegisterByteSizeObserverCheap(value)) {
Review Comment:
if the byte observer is cheap, doesn't that mean we are scaling the elements
below even if we are observing every time?
similarly it seems wrong if some values are cheap to observe and others are
not.
Maybe it would be better as:
if (observerIsCheap(value)) {
update w/o scaling
} else if (sampleElement()) {
update w/ scaling
}
Since we're not calling sample() in case where it is cheap we wouldn't
increase samplingToken except for elements that might be sampled.
this also seems like it might be a problem with
OutputObjectAndByteCounter.java
##########
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/StateBackedIterableTest.java:
##########
@@ -269,7 +269,8 @@ public void testByteObservingStateBackedIterable() throws
Exception {
.sum();
observer.advance();
// 5 comes from size and hasNext (see IterableLikeCoder)
- assertEquals(iterateBytes + 5, observer.total);
+ // observer receives scaled
+ assertTrue(iterateBytes + 5 >= observer.total);
Review Comment:
is it less than 10 elements? shoudln't it be exact in that case?
If it's off by some double-math errors you could use some fuzzy equals
(https://guava.dev/releases/23.0/api/docs/com/google/common/math/DoubleMath.html#fuzzyEquals-double-double-double-
for example but perhaps some testing assert variant exists)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]