[jira] [Commented] (PHOENIX-1674) Snapshot isolation transaction support through Tephra

James Taylor (JIRA) Wed, 04 Mar 2015 22:59:21 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348277#comment-14348277
 ]


James Taylor commented on PHOENIX-1674:
---------------------------------------

Here are some random questions and comments about Tephra and how we might 
integrate it into Phoenix, [~ghelmling]:

1. What are the options for the scope of the snapshot isolation?
a) single snapshot isolation across all statements executed in a txn
- transaction ID retrieved on first statement execution on connection.
- commits after txn ID allocated by other txns would not be seen until after 
commit/rollback
b) nested: snapshot per statement with conflict resolution spanning multiple 
txn IDs.
- could each stmt be executed with its own txn ID? You'd see committed data 
when a stmt is run after another client had committed the data.
- conflict resolver could be told all row keys updated across all statements 
(across multiple txn IDs). 
- **Is this type of snapshot isolation possible in Tephra?**

2. What type of SQL isolation level supported by Tehpra?
a) write/write conflicts are clearly detected
b) what about a read of a piece of data that is then subsequently changed by 
another txn before ours is committed?
c) **How would you define Tephra's behavior in terms of ANSI standard SQL 
isolation levels?**

3. How would row deletion be handled?
a) Support undelete of family delete marker at HBase level.
   - nothing necessary for Phoenix
b) Phoenix defines its own delete marker. This would be a known cf:cq and only 
be set in this single KV for a row (Phoenix already has an "empty" key value 
defined for every row to allow for rows with only row key columns). This option 
is less ideal because it limits the ability of non Phoenix clients from 
scanning over the data and it would confuse other HBase features such as 
minVersion/maxVersion.
   - Tephra would need to support a different mechanism to determine if a row 
is deleted (as an empty value for Phoenix means either a null value or may just 
be our empty key value for schemas in which all columns are contained in the 
row key). Either a pluggable approach or perhaps a Cell tag (but note this 
would be only on a single known cf:cq in a row). **Is this feasible?**
   - Phoenix would need to change the Scan projection logic to always project 
the empty key value cf:cq so we'd know if a row was deleted or not (which would 
be a perf hit when this cf is not required for the scan).
   - Phoenix has existing logic that would need to be tweaked when the empty 
key value needs to be moved (if all columns in the cf are dropped). Somewhat of 
a corner case.

4. Timestamp handling
a) Does Tephra manage setting the Cell timestamp transparently when using 
TransactionAwareHTable and the coprocessor?
b) Would the Phoenix coprocessors coexist well with the Tephra coprocessors? 
It'd like be more efficient to insert the Tephra filter *after* our skip scan 
filter (which would prevent the Tephra filter from being run on a large 
percentage of rows). Our skip scan filter handles point lookups. **Would it be 
possible for Phoenix to order the Filters itself?**
b) The Phoenix client "locks" the timestamp by asking the RS hosting the 
SYSTEM.CATALOG table for the "current server time" when it checks if the schema 
for the table being queried is up-to-date.
   - Instead, we'd could use the txn ID that we have for the current open txn. 
We'd still need the original RPC we do to verify that we have the latest 
metadata (which is somewhat unfortunate).
   - UPSERT VALUES and DELETE would do the same instead of using 
LATEST_TIMESTAMP
b) Another alternative might be to set the CURRENT_SCN on the connection to 
match the txn ID, as that would essentially force the timestamps of mutations 
to be set to the txn ID.
   - Issue may arise in reading own writes (see below), as with a CURRENT_SCN, 
Phoenix would not see it's own writes.

5. Reading own writes
Currently Phoenix cannot read it's own writes. With Tephra in-place we could 
overcome this by:
- submitting any pending txn data
- passing our txnID and making sure Tephra doesn't filter based on this (just 
remove it from the list for the filter?)
- reading at txnID+1.

> Snapshot isolation transaction support through Tephra
> -----------------------------------------------------
>
>                 Key: PHOENIX-1674
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1674
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>
> Tephra (http://tephra.io/ and https://github.com/caskdata/tephra) is one 
> option for getting transaction support in Phoenix. Let's use this JIRA to 
> discuss the way in which this could be integrated along with the pros and 
> cons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1674) Snapshot isolation transaction support through Tephra

Reply via email to