[ 
https://issues.apache.org/jira/browse/PHOENIX-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458377#comment-16458377
 ] 

Ohad Shacham commented on PHOENIX-4484:
---------------------------------------

[~jamestaylor], I think that I was wrong in this case and disabling the GC is 
not required. A general transaction might miss data if the low watermark 
exceeds the transaction timestamp during its run. This caused by the GC that 
removes all the versions of the key below the low watermark, except for the 
last one.  During index population, the transaction has the fence id and it 
writes the data using auto commit (version and commit timestamp are the same) 
and does not need to commit. 

It is true that this transaction might miss data if the low watermark exceeds 
the fence id, however, if it misses data of a key K, it means that there exists 
another record of K with a version higher than the fence and lower than the low 
watermark. Because every entry written after the fence will be automatically 
added to the index (using the incremental mechanism) then the entry of K will 
be added to the index as well. It is true that we miss data, however, every 
transaction that might be interested in this data started below the low 
watermark and will be aborted on commit, so we don't really care. 

To sum up, the fact that at the fence, we enable the mechanism that updates the 
index with every mutation to the data table. Removes the need to disable the GC.

 

> Write directly to HBase when creating an index for transactional table
> ----------------------------------------------------------------------
>
>                 Key: PHOENIX-4484
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4484
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Ohad Shacham
>            Assignee: Ohad Shacham
>            Priority: Major
>
> Today, when creating an index table for a non empty data table. The writes 
> are performed using the transaction api and both consumes client side memory, 
> for storing the writeset, and checks for conflict analysis upon commit. This 
> is redundant and can be replaced by direct write to HBase. For this reason, a 
> new function in the transaction abstraction layer should be added that writes 
> directly to HBase at the Tephra's case and adds shadow cells with the fence 
> id at the Omid case. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to