[ 
https://issues.apache.org/jira/browse/PHOENIX-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kadir Ozdemir updated PHOENIX-7710:
-----------------------------------
    Summary: Supporting single cell storage format for mutable tables  (was: 
Extending single cell storage format for mutable tables)

> Supporting single cell storage format for mutable tables
> --------------------------------------------------------
>
>                 Key: PHOENIX-7710
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7710
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Priority: Major
>
> Phoenix uses two storage formats for immutable tables: single cell per column 
> (ONE_CELL_PER_COLUMN) and single cell per family 
> (SINGLE_CELL_ARRAY_WITH_OFFSETS). Packing all columns of a row (within a 
> specific column family) into a single cell reduces storage, network, and 
> memory usage, generally improving performance for many use cases.
> The single cell storage format is only supported for immutable tables. 
> Extending it to mutable tables would have required Phoenix to read existing 
> rows and merge them with new, potentially partial, mutations to generate full 
> row mutations. While this might be acceptable for tables with covered indexes 
> (as rows are read for generating index mutations), it would be costly for 
> other tables.
> Phoenix has added more server side functionality by leveraging the HBase 
> coprocessor architecture to optimize HBase better for Phoenix use cases. A 
> recent such customization was done for HBase compaction. This was required 
> for eliminating data integrity issues when TTL is configured. HBase TTL 
> operates at the cell level and leads to partial row expiration. Partial row 
> expiration may result in data loss in Phoenix. To fix this, Phoenix 
> introduced a compaction scanner that preserves row integrity during TTL 
> processing (see PHOENIX-6888). 
> This Phoenix-level compaction can be leveraged to support the single cell 
> format for mutable tables without requiring row reads during mutations. To 
> achieve this, each mutation can be represented as a separate cell (per column 
> family), with each cell (within a column family) having a different 
> dynamically generated column qualifier. During flushes and compaction, these 
> cells can be merged into a single cell. During scans, HBase region scanners 
> will return all these cells (each with its own column qualifier), and Phoenix 
> custom filters can merge them into one cell before applying filtering. These 
> changes allow Phoenix to pack all columns of a column family for a given row 
> into a single cell.
> The single cell storage format for mutable tables does not have to follow 
> exactly the same implementation of that for immutable tables. For example, 
> the empty column can also be packed together with other columns in the new 
> format for mutable tables, which is not the case for the immutable format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to