[ 
https://issues.apache.org/jira/browse/PHOENIX-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735131#comment-16735131
 ] 

Ohad Shacham commented on PHOENIX-5090:
---------------------------------------

We added row level conflict analysis in  -OMID-71.- 

As [~yonigo] said, this version keeps the modified cells information at the 
client size.

This cell information is needed from two reasons, the first is for adding 
shadow cells at post commit and the second is for deleting these cells when a 
transaction aborts. 

The current implementation only supports ROW level conflict analysis and CELL 
level conflict analysis. A combination of these two is not supported :) Is it 
needed? I mean does the semantics of SQL allow modifications of different cells 
at the same row without a conflict? It can be extended for sure, however, can 
incur some runtime penalties.

 

Regarding the memory limitations. In general, the cell level information can be 
discarded from the client side, however, this will remove the shadow cells 
update at post commit and also will remove the deletion that client does in 
case of an abort. For the first, we can count on the fact that clients that 
reads a cell without a shadow cell creates a shadow cell. This does not happen 
asynchronously and will be beneficial only if cells are accessed many times for 
read. Another disadvantage is that the commit information will stay at the 
commit table and can only be discarded later on by a gc. For earlier gc, we can 
add a counter in the commit table, for each transaction, that shows how many 
cells/rows were written by the transaction and decrement this number when some 
client adds a shadow cells, and delete when it becomes zero. However, this 
requires CheckAndMutate for the update and I am sure this is not what we would 
like to do. We can add a shadow cell at the ROW level, as [~yonigo], suggested 
but this might requires additional HBase gets when looking for this shadow cell.

 

For the second, we can wait for the GC to clean these cells but this will 
create only when the transaction id will be lower than the low water mark. As 
far as we saw HBase row delete operation deletes all the row's column with 
version which is *lower* or equal to the transaction version and we cannot use 
this. 

 

 

> Discuss: Allow transactional writes without buffering the entire transaction 
> on the client.
> -------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5090
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5090
>             Project: Phoenix
>          Issue Type: Wish
>            Reporter: Lars Hofhansl
>            Priority: Major
>
> Currently it is not possible execute transactions in Phoenix that are too 
> large to be buffered entirely on the client.
> Both Tephra and Omid support writing uncommitted data to HBase immediately 
> and at full speed. The client still needs to keep tracks of the rows changes 
> for:
> # Conflict detection
> # (for Omid) writing the shadow cells
> I'd like to do some brainstorming here.
> * It should *always* be enough to only hold on to the changed rows (and 
> columns?) only for _conflict resolution_ and free the rest from the client as 
> soon as the uncommitted data is written to HBase.
> * For the shadows cells we need only keep the rows changed, right?
> * There are situations where we can avoid the client site buffering entirely 
> (perhaps only for Tephra) when we declare a table or upsert not to 
> participate in conflict resolution.
> [~tdsilva], [~ohads], [~yonigo], [~jamestaylor], [~vincentpoon], more, better 
> ideas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to