Yes, it is definitely possible for a major compaction to see only part of a row. Only during a full major compaction will an iterator see all of the tablet's files. Even then, the iterator would not see any k/v entries for the row that were still in memory in the tablet server. Ingest would need to be paused and the table would need to be flushed for a full major compaction to be guaranteed to see entire rows.
The IteratorEnvironment passed into the iterator initialization has a method isFullMajorCompaction that allows an iterator to tell if a full major compaction is happening or not. Here is an example of its use: https://github.com/apache/accumulo/blob/1dc72fce2c781dee597c8c11876a3bc6c321c199/core/src/main/java/org/apache/accumulo/core/iterators/user/RowDeletingIterator.java#L98 It seems like you are correct about reseeks not occurring during major compaction, but I would need to double check that. Billie On Thu, Mar 25, 2021 at 12:43 PM Bradley Barber <bbar...@phemi.com> wrote: > Hi all! > > I'm looking for details on major compaction. Some of my colleagues and I > have been working on an iterator which we are attaching at major compaction > scope. The logic of this iterator requires that it always see entire rows - > ie. iterates over all KV entries which make up all versions of a given row. > From the Accumulo documentation, we had assumed this was guaranteed for > major compactions since tablets are partitioned at row boundaries. > > However, we are seeing some intermittent (and fairly rare) occurrences of > incorrect behaviour from our iterator. Having reviewed and tested the > iterator logic, we are quite confident it works as intended. Were we > incorrect in thinking that only entire rows will take part in major > compactions? Are there instances where a major compaction within a tablet > will see only partial rows? On reviewing the documentation, it seems this > *may > *be possible when a major compaction is called to merge a subset of RFiles > in a given tablet, but it's not very clear. Would anyone be able to clarify > this for us? > > Issues with our iterator logic may also occur if reseeks are performed > during a major compaction. However, from our reading of the available > documentation, we got the impression that reseeks do not occur during major > compaction and we can't see why they would be. Is this guaranteed or are > there cases where a reseek may in fact be called during major compaction? > > Sorry for the long, involved questions but any clarification would help us > greatly and be very appreciated :) > > Hope you all are having a good week, > Bradley Barber >