[
https://issues.apache.org/jira/browse/PHOENIX-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735131#comment-16735131
]
Ohad Shacham commented on PHOENIX-5090:
---------------------------------------
We added row level conflict analysis in -OMID-71.-
As [~yonigo] said, this version keeps the modified cells information at the
client size.
This cell information is needed from two reasons, the first is for adding
shadow cells at post commit and the second is for deleting these cells when a
transaction aborts.
The current implementation only supports ROW level conflict analysis and CELL
level conflict analysis. A combination of these two is not supported :) Is it
needed? I mean does the semantics of SQL allow modifications of different cells
at the same row without a conflict? It can be extended for sure, however, can
incur some runtime penalties.
Regarding the memory limitations. In general, the cell level information can be
discarded from the client side, however, this will remove the shadow cells
update at post commit and also will remove the deletion that client does in
case of an abort. For the first, we can count on the fact that clients that
reads a cell without a shadow cell creates a shadow cell. This does not happen
asynchronously and will be beneficial only if cells are accessed many times for
read. Another disadvantage is that the commit information will stay at the
commit table and can only be discarded later on by a gc. For earlier gc, we can
add a counter in the commit table, for each transaction, that shows how many
cells/rows were written by the transaction and decrement this number when some
client adds a shadow cells, and delete when it becomes zero. However, this
requires CheckAndMutate for the update and I am sure this is not what we would
like to do. We can add a shadow cell at the ROW level, as [~yonigo], suggested
but this might requires additional HBase gets when looking for this shadow cell.
For the second, we can wait for the GC to clean these cells but this will
create only when the transaction id will be lower than the low water mark. As
far as we saw HBase row delete operation deletes all the row's column with
version which is *lower* or equal to the transaction version and we cannot use
this.
> Discuss: Allow transactional writes without buffering the entire transaction
> on the client.
> -------------------------------------------------------------------------------------------
>
> Key: PHOENIX-5090
> URL: https://issues.apache.org/jira/browse/PHOENIX-5090
> Project: Phoenix
> Issue Type: Wish
> Reporter: Lars Hofhansl
> Priority: Major
>
> Currently it is not possible execute transactions in Phoenix that are too
> large to be buffered entirely on the client.
> Both Tephra and Omid support writing uncommitted data to HBase immediately
> and at full speed. The client still needs to keep tracks of the rows changes
> for:
> # Conflict detection
> # (for Omid) writing the shadow cells
> I'd like to do some brainstorming here.
> * It should *always* be enough to only hold on to the changed rows (and
> columns?) only for _conflict resolution_ and free the rest from the client as
> soon as the uncommitted data is written to HBase.
> * For the shadows cells we need only keep the rows changed, right?
> * There are situations where we can avoid the client site buffering entirely
> (perhaps only for Tephra) when we declare a table or upsert not to
> participate in conflict resolution.
> [~tdsilva], [~ohads], [~yonigo], [~jamestaylor], [~vincentpoon], more, better
> ideas?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)