Yes, it is definitely possible for a major compaction to see only part of a
row. Only during a full major compaction will an iterator see all of the
tablet's files. Even then, the iterator would not see any k/v entries for
the row that were still in memory in the tablet server. Ingest would need
to be paused and the table would need to be flushed for a full major
compaction to be guaranteed to see entire rows.

The IteratorEnvironment passed into the iterator initialization has a
method isFullMajorCompaction that allows an iterator to tell if a full
major compaction is happening or not. Here is an example of its use:
https://github.com/apache/accumulo/blob/1dc72fce2c781dee597c8c11876a3bc6c321c199/core/src/main/java/org/apache/accumulo/core/iterators/user/RowDeletingIterator.java#L98

It seems like you are correct about reseeks not occurring during major
compaction, but I would need to double check that.

Billie

On Thu, Mar 25, 2021 at 12:43 PM Bradley Barber <bbar...@phemi.com> wrote:

> Hi all!
>
> I'm looking for details on major compaction. Some of my colleagues and I
> have been working on an iterator which we are attaching at major compaction
> scope. The logic of this iterator requires that it always see entire rows -
> ie. iterates over all KV entries which make up all versions of a given row.
> From the Accumulo documentation, we had assumed this was guaranteed for
> major compactions since tablets are partitioned at row boundaries.
>
> However, we are seeing some intermittent (and fairly rare) occurrences of
> incorrect behaviour from our iterator. Having reviewed and tested the
> iterator logic, we are quite confident it works as intended. Were we
> incorrect in thinking that only entire rows will take part in major
> compactions? Are there instances where a major compaction within a tablet
> will see only partial rows? On reviewing the documentation, it seems this
> *may
> *be possible when a major compaction is called to merge a subset of RFiles
> in a given tablet, but it's not very clear. Would anyone be able to clarify
> this for us?
>
> Issues with our iterator logic may also occur if reseeks are performed
> during a major compaction. However, from our reading of the available
> documentation, we got the impression that reseeks do not occur during major
> compaction and we can't see why they would be. Is this guaranteed or are
> there cases where a reseek may in fact be called during major compaction?
>
> Sorry for the long, involved questions but any clarification would help us
> greatly and be very appreciated :)
>
> Hope you all are having a good week,
> Bradley Barber
>

Reply via email to