Github user joshelser commented on a diff in the pull request:
https://github.com/apache/accumulo/pull/33#discussion_r29155217
--- Diff: docs/src/main/asciidoc/chapters/iterator_design.txt ---
@@ -145,8 +146,16 @@ alter the internal state of the Iterator.
These methods simply return the current Key-Value pair for this iterator.
If `hasTop` returns true,
both of these methods should return non-null objects. If `hasTop` returns
false, it is undefined
-what these methods should return. Multiple calls to these methods should
not alter the state
-of the Iterator like `hasTop`.
+what these methods should return. Like `hasTop`, multiple calls to these
methods should not alter
+the state of the Iterator.
+
+When saving a Key or Value from a source iterator's `getTopKey` or
`getTopValue` methods
+for use after calling `next` on the source iterator (e.g., when cacheing
keys or values
+from the source iterator), it is important to copy the Key or Value into a
new object
+because the source iterator may reuse the Key or Value objects for
performance reasons.
--- End diff --
I'm a little concerned about recommending to always copy the Key or Value
is that returned as it will drastically increase the number of created objects
in the tserver and probably tank performance. At the same time, I don't think
I've ever done this myself (copy the Key/Value in an iterator), but I haven't
run into any issues that you're warning against (maybe it only happens farther
"down" the stack at the iterators reading off of disk?). How have you run into
this issue? Can we try to make this more specific?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---