I think requiring the caller to know to copy/clone the element to be allowed to call hasNext() multiple times is extremely non-intuitive. Having the caller know that it's dangerous / not allowed to hang onto an element without copying while continuing to iterate (e.g. when looking for the "largest" element) is fine, as this is used all over this project, as well as being the general contract with Writables in Hadoop.
I think the most straightforward fix is Ted's b) solution. Continue with re-use, but make sure that hasNext() is side-effect free by dropping AbstractIterator as our impl here. On Fri, Apr 12, 2013 at 12:37 PM, Sean Owen <[email protected]> wrote: > Yeah you have to have a clone() or copy constructor or something. Add > that? that solves the other problem. At least, that's the intent and > it is an option. > > If you need the value to be long-lived, one way or the other a new > value is getting created. There's no way around it. There's only an > optimization to be had when it need not be long lived. For example the > iterators over sequence file key/values offer exactly this choice and > the default is to *not* reuse. > > The reuse is a nice win in cases where you know it's going to be a benefit. > If this particular iterator will never possibly take advantage of this > optimization, there's no value in offering it, yes. > > You could offer the option, if it is used in cases where the > optimization works, but that's extra complexity. I think I'd just keep > an eye on performance impact and consider that option to have it both > ways as a possible solution if needed. > > On Fri, Apr 12, 2013 at 8:31 PM, Dan Filimon > <[email protected]> wrote: > > The problem with the 0th option is that the elements provide no way of > > cloning them. > > > > Also, reusing the value causes the problem in the first post, where the > > element advances even though it shouldn't. > > Calling next(), keeping the reference and expecting the value to not > change > > when calling hasNext() seems reasonable to me. > > > > It seems non-intuitive that it should be cloned, because calling > hasNext() > > could invalidate it. In my mind, the current element should only change > > when calling next(). > > > > > > > > On Fri, Apr 12, 2013 at 10:22 PM, Sean Owen <[email protected]> wrote: > > > >> I'm sure I did (at least much of) the AbstractIterator change so blame > >> me... but I think the pattern itself is just fine. It's used in many > >> places in the project. Reusing the value object is a big win in some > >> places. Allocating objects is fast but a trillion of them still adds > >> up. > >> > >> It does contain a requirement, and that is that the caller is supposed > >> to copy/clone the value if it will be used at all after the next > >> iterator operation. That's the 0th option, to just fix the caller > >> here. > >> > >> On Fri, Apr 12, 2013 at 7:49 PM, Ted Dunning <[email protected]> > >> wrote: > >> > The contract of computeNext is that there are no side effects visible > >> > outside (i.e. apparent functional style). This is required since > >> > computeNext is called from hasNext(). > >> > > >> > We are using a side-effecting style so we have a bug. > >> > > >> > We have two choices: > >> > > >> > a) use functional style. This will *require* that we allocate a new > >> > container element on every call to computeNext. This is best for the > >> user > >> > because they will have fewer surprising bugs due to reuse. If > allocation > >> > is actually as bad as some people think (I remain skeptical of that > >> without > >> > tests) then this is a bad move. If allocation of totally ephemeral > >> objects > >> > is as cheap as I think, then this would be a good move. > >> > > >> > b) stop using AbstractIterator and continue with the re-use style. > And > >> add > >> > a comment to prevent a bright spark from reverting this change. (I > >> suspect > >> > that the bright spark who did this in the first place was me so I can > be > >> > rude) > >> > -- -jake
