Hi Billie, Thanks for the quick reply! This confirms some of the concerns we had when reviewing the documentation more closely. We've been logging out the isFullMajorCompaction value thinking this was likely the issue, so now we just need to design around it.
Thanks again for the reply and for taking the time to clarify this, much appreciated! All the best, Bradley Barber On Thu, Mar 25, 2021 at 10:37 AM Billie Rinaldi <bil...@apache.org> wrote: > Yes, it is definitely possible for a major compaction to see only part of a > row. Only during a full major compaction will an iterator see all of the > tablet's files. Even then, the iterator would not see any k/v entries for > the row that were still in memory in the tablet server. Ingest would need > to be paused and the table would need to be flushed for a full major > compaction to be guaranteed to see entire rows. > > The IteratorEnvironment passed into the iterator initialization has a > method isFullMajorCompaction that allows an iterator to tell if a full > major compaction is happening or not. Here is an example of its use: > > https://github.com/apache/accumulo/blob/1dc72fce2c781dee597c8c11876a3bc6c321c199/core/src/main/java/org/apache/accumulo/core/iterators/user/RowDeletingIterator.java#L98 > > It seems like you are correct about reseeks not occurring during major > compaction, but I would need to double check that. > > Billie > > On Thu, Mar 25, 2021 at 12:43 PM Bradley Barber <bbar...@phemi.com> wrote: > > > Hi all! > > > > I'm looking for details on major compaction. Some of my colleagues and I > > have been working on an iterator which we are attaching at major > compaction > > scope. The logic of this iterator requires that it always see entire > rows - > > ie. iterates over all KV entries which make up all versions of a given > row. > > From the Accumulo documentation, we had assumed this was guaranteed for > > major compactions since tablets are partitioned at row boundaries. > > > > However, we are seeing some intermittent (and fairly rare) occurrences of > > incorrect behaviour from our iterator. Having reviewed and tested the > > iterator logic, we are quite confident it works as intended. Were we > > incorrect in thinking that only entire rows will take part in major > > compactions? Are there instances where a major compaction within a tablet > > will see only partial rows? On reviewing the documentation, it seems this > > *may > > *be possible when a major compaction is called to merge a subset of > RFiles > > in a given tablet, but it's not very clear. Would anyone be able to > clarify > > this for us? > > > > Issues with our iterator logic may also occur if reseeks are performed > > during a major compaction. However, from our reading of the available > > documentation, we got the impression that reseeks do not occur during > major > > compaction and we can't see why they would be. Is this guaranteed or are > > there cases where a reseek may in fact be called during major compaction? > > > > Sorry for the long, involved questions but any clarification would help > us > > greatly and be very appreciated :) > > > > Hope you all are having a good week, > > Bradley Barber > > >