Hi,

We currently have a use case where we want to be able to do flashback
queries (i.e. provide a CurrentSCN on a Phoenix connection to read
previous versions of records). In the same table where we're doing
this, we are also regularly adding and deleting records. What we want
to do is have versioning at row level, but allow permanently deleting
a row (i.e. if a row is deleted, all versions of all columns should
also be deleted, and no longer be visible in flashback queries).

The issue at hand is that if we set a cell's value to null (e.g.
"UPSERT INTO mytab (id, name) values (1, null)") then all the history
of that one column for that one record is lost after the next major
compaction.

This can be worked around by making use of the KEEP_DELETED_CELLS
attribute in the table definition (enabled by default, although as of
PHOENIX-1498 this will no longer be the case). However,
KEEP_DELETED_CELLS also translates to keeping all rows that have been
deleted. In other words, "DELETE FROM mytab WHERE id = 1" means that
the underlying record won't really be deleted, ever.

I see two ways of facilitating this kind of row-level versioning that
still supports row-level deletes:

1. Turn KEEP_DELETED_CELLS off, and ensure that all data types have an
explicit way of writing null values. This would require changing a
number (or all) of PDataTypes so that they all have an explicit null
representation that can be explicitly stored.

2. Turn KEEP_DELETED_CELLS on, and have a RegionCoprocessor that
implements preCompact to return a custom InternalScanner that omits
everything per row that is older than the most recent delete of the
row (a delete of a row would be interpreted as a delete marker on the
QueryConstants.EMPTY_COLUMN_NAME cell).

The first question: is this something that would be useful for Phoenix
in general, or is it too specific? I guess it falls under the case of
general row versioning (as opposed to full table content versioning),
which may or may not be interesting to others.

The second question: any comments on either of these approaches?

This is something that we need regardless, so if this sounds like it
may be interesting to add into Phoenix proper then I'll create a JIRA
ticket and patches. If not, then it looks like option 2 is the one
real option without needing to customize Phoenix.

Thanks in advance for any feedback on this!

- Gabriel

Reply via email to