[jira] [Commented] (PHOENIX-4552) Add support for a ROW_TIMESTAMP that doesn't affect the primary key

James Taylor (JIRA) Tue, 23 Jan 2018 10:28:43 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336193#comment-16336193
 ]


James Taylor commented on PHOENIX-4552:
---------------------------------------

Nice example, [~spetschu]. It sparked my memory on why we only support 
ROW_TIMESTAMP for immutable data (in which it wouldn't make a difference if the 
ROW_TIMESTAMP column is in the row key since by definition rows don't overlap). 
Phoenix only returns a single version of a row. Changing this is definitely in 
the category of "large T-shirt" in terms of scope. See PHOENIX-590 for some 
discussion. There's a patch there which solves part of the issue - essentially 
"exploding" the Cells received back into multiple rows (i.e. one row per unique 
timestamp across all versions). Let me add an example on that JIRA so the 
complexity of the issue is more clear. One thorny issue too is that there's no 
SQL way of getting the timestamp of a row (though we have an under-the-covers 
way of getting that, we'd need a way of surfacing that in SQL).

There might be a middle ground, though, but I suspect there will be many edge 
cases. If we still continue to return only the single latest version and 
restrict the filtering done on ROW_TIMESTAMP to use only greater than or 
greater than or equal then it might be more feasible. This would do the time 
range filtering and then only return columns larger than this. The primary 
problem here is that there might be *older* column values that make up the row 
which would be filtered out.



> Add support for a ROW_TIMESTAMP that doesn't affect the primary key
> -------------------------------------------------------------------
>
>                 Key: PHOENIX-4552
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4552
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Stephen Petschulat
>            Priority: Minor
>
> By declaring a ROW_TIMESTAMP constraint on a Phoenix table, it does two 
> things 1) expose the hbase native timestamp as this column and 2) prepend 
> your primary key with this timestamp as well.
> It would be useful to have a similar feature that only exposes the hbase 
> native timestamp. This would allow explicit setting of the timestamp when 
> upserting data while allowing multiple hbase versions. It is possible to then 
> query for that specific key and version(s).
> Potential approach:
> {code:sql}
> CREATE TABLE COMMENTS (
>    COMMENT_ID INT NOT NULL,
>    REVISION_NUM BIGINT NOT NULL ROW_TIMESTAMP,    // NEW use of keyword
>    COMMENT_BODY TEXT
>    CONSTRAINT PK PRIMARY KEY(COMMENT_ID))
> UPSERT INTO COMMENTS (123, 1, 'edit 1 comment')
> UPSERT INTO COMMENTS (123, 2, 'edit 2 of comment')
> UPSERT INTO COMMENTS (123, 3, 'edit 3 of comment')
> {code}
>  
> Current behavior of ROW_TIMESTAMP would create a new primary for each upsert, 
> so querying by primary key is no longer straightforward when you don't know 
> the version number at query time. 
> {code:sql}
> SELECT * FROM COMMENTS WHERE COMMENT_ID = 123  // => returns most recent 
> version 'edit 3 of comment'
> SELECT * FROM COMMENTS WHERE COMMENT_ID = 123 AND REVISION_NUM = 1   // => 
> returns explicit version 'edit 1 comment'
> {code}
>  
> It can also be useful to return multiple versions (related: PHOENIX-590)
> {code:sql}
> SELECT * FROM COMMENTS WHERE COMMENT_ID = 123 AND REVISION_NUM < 3   // => 
> returns 2 rows
> {code}
>  
> Or just the highest version less than or equal to a particular version 
> (allowing snapshot queries):
> {code:sql}
> // set CurrentSCN=2 on connection
> SELECT * FROM COMMENTS WHERE COMMENT_ID = 123 // => returns 'edit 2 of 
> comment'
> {code}
> CurrentSCN already allows this type of snapshot query but not against an 
> explicitly set timestamp with multiple versions. The primary key injection 
> prevents this. The above query would behave similar to:
> {code:java}
> scan 'COMMENTS', {TIMERANGE => [0, <maxversionid+1>]}
> {code}
>  This returns the highest versioned value for each key that is less than a 
> specified maximum version number.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-4552) Add support for a ROW_TIMESTAMP that doesn't affect the primary key

Reply via email to