[jira] [Commented] (PHOENIX-914) Native HBase timestamp support to optimize date range queries in Phoenix

Jonathan Leech (JIRA) Thu, 18 Dec 2014 15:45:07 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252562#comment-14252562
 ]


Jonathan Leech commented on PHOENIX-914:
----------------------------------------

Conceptually, the timestamp of the version of a row at a particular point in 
time is the max of all of the cell timestamps. This still works if all the 
cells are written in every update, or if they are written individually via 
Phoenix or HBase. 

If Phoenix is re-writing every cell of a row for every update, that may not be 
very efficient for a variety of reasons; storage, I/O, replication, cpu. There 
are implications to data retention / expiration where a row could partially age 
out due to the TTL of the column family and the times any particular columns 
are written, but I think that is already a strength in HBase and should be 
preserved in Phoenix. Behavior could be configurable in Phoenix.

It seems to me a pseudocolumn would definitely work and I can also use cases 
for a function returning the time of a column, as Dominik suggested, even 
without a a range query. A function could also set the timestamps for inserts 
and updates at a cell level. 

For range queries; eg between t1 and t2,  I would want what the row looked like 
at t1 (whether or not there were update(s) at t1), and a row for each timestamp 
where there where changes between t1 and t2.

The equivalent in Oracle (Flashback query), the syntax in queries is select ... 
 versions between timestamp t1 and t2, or for a single point in time select ... 
as of timestamp t. Their pseudocolumn is ora_rowscn, which can be converted to 
a timestamp via another function. Can also do as of scn and versions between 
scn1 and scn2. 

For phoenix, an epoch time in millis maps well to the scn concept. Support in 
Phoenix SQL for either timestamps or longs similar to how Oracle does it makes 
sense. In the extreme, HBase doesn't require the timestamps to really be times 
and a scheme more like Oracle's scn could be put in place. Benefit would be to 
get sub-millisecond precision in the timestamps or could aid in transaction 
support.




> Native HBase timestamp support to optimize date range queries in Phoenix 
> -------------------------------------------------------------------------
>
>                 Key: PHOENIX-914
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-914
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Jeffrey Zhong
>         Attachments: PHOENIX-914.patch
>
>
> For many applications one of the column of a table can be (and must be) 
> naturally mapped 
> to HBase timestamp. What it gives us is the optimization on StoreScanner 
> where HFiles with timestamps out of range of
> a Scan operator will be omitted. Let us say that we have time-series type of 
> data (EVENTS) and custom compaction, where we create 
> series of HFiles with continuous non-overlapping timestamp ranges.
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
>     METRICID  VARCHAR NOT NULL,
>     METRICNAME VARCHAR,
>     SERVICENAME VARCHAR NOT NULL,
>     ORIGIN VARCHAR NOT NULL,
>     APPID VARCHAR,
>     IPID VARCHAR,
>     NVALUE DOUBLE,
>     TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/,
>     DATA VARCHAR,
>     SVALUE VARCHAR
>     CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID, 
> TIME)
> ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
> Make note on   TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/ - this is the Hint to 
> Phoenix that the column
> TIME must be mapped to HBase timestamp. 
> The Query:
> Select all events of type 'X' for last 7 days
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() - 
> 7*24*3600000; (this may be not correct SQL syntax of course)
> These types of queries will be efficiently optimized if:
> 1. Phoenix maps  TIME column to HBase timestamp
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan 
> timerange 
> Although this :
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
> will work in my case- it may not be efficient from performance point of view 
> because for every INSERT/UPSERT 
> new Connection object and new Statement is created, beside this we still need 
> the optimization 2. (see above). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-914) Native HBase timestamp support to optimize date range queries in Phoenix

Reply via email to