I created an improvement Jira (PHOENIX-7544) to add an empty column to all column families. Let me know if you have any questions or concerns for this Jira. I would like to include it in the next release. Here is the description of it:
In Phoenix, a mutation is a set of cells with the same row key and timestamp corresponding to an update on the existing row or insertion of the first image of the row made by an UPSERT statement. This mutation always includes an empty column cell which is an internal column/cell not visible to the user. This cell is used for multiple purposes. It allows a table schema to include only PK columns. As PK columns are stored in the row key, we need a cell to represent the row in HBase as a row in HBase is a set of cells with the same row key and to form a cell we need a non-PK column. The empty column does not hold any value for data tables but it includes the row status for covered indexes. This row status indicates if the index row is verified or unverified (or in other words, committed or uncommitted). This is the second use of the empty column. The third use is for finding out the last modification time (the row timestamp) for a row. Instead of reading the cells for all non-PK columns of a row to find the last modification time, we just need to read the empty column as the empty column is always included in row mutations. Phoenix compaction, TTLRegionScanner (used for masking expired rows) and PHOENIX_ROW_TIMESTAMP() rely on the empty column to determine the row timestamp. In Phoenix, there is only one empty column for a given row. This empty column is stored in one of the column families if the row has multiple column families. This creates challenges for Phoenix compaction. HBase TTL and thus HBase major compaction expire cells individually. Phoenix compaction is a customized HBase compaction to make sure that the unit of expiration is row instead of cell in order to eliminate partial row expiry (a data integrity issue in Phoenix). Compaction is done at the column family level in HBase however Phoenix compaction needs to fetch all cells from all column families for a given row to determine if the row should expire or not. This requires region level scans during major compaction in addition to the compaction scan done at the column family (that is, store) level. This obviously requires scanning the same row multiple times, one for each column family and causes performance issues. The reason that Phoenix major compaction needs to read cells from all column families today is to see if the time gap between mutations on a given row is more than the TTL value for the table. If so, the mutations beyond such a gap should expire. This gap analysis will not be necessary if every column family includes an empty column, requiring Phoenix to insert an empty column cell for each of the column families to every mutation. Please note that Phoenix does this for only one of the column families today. This is the reason we propose including an empty column for each column family in Phoenix. Clearly, this proposal requires inserting these empty column cells not only for new mutations but also the existing mutations on a table with multiple column families. This means this proposal requires upgrading existing such tables. The suggested approach for this is to have an upgrade process using Phoenix task (the SYSTEM.TASK table). For each table to be upgraded, we can add a specific task to insert empty cells to existing mutations. This task first disables major compaction on the table, then adds the empty cells, and finally enables major compaction again. Please note that Phoenix compaction is an optional feature as it is disabled by default. Although this proposal is required only for the tables with multiple column families on which Phoenix compaction is enabled, we suggest that including empty columns for all column families from now on since it simplifies Phoenix by eliminating the logic to determine which column family should include the empty column. Even though Phoenix compaction is an optional feature, it is actually required for tables with TTL since it prevents data integrity issues due to the cell level TTL feature in HBase.