[
https://issues.apache.org/jira/browse/PHOENIX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252562#comment-14252562
]
Jonathan Leech commented on PHOENIX-914:
----------------------------------------
Conceptually, the timestamp of the version of a row at a particular point in
time is the max of all of the cell timestamps. This still works if all the
cells are written in every update, or if they are written individually via
Phoenix or HBase.
If Phoenix is re-writing every cell of a row for every update, that may not be
very efficient for a variety of reasons; storage, I/O, replication, cpu. There
are implications to data retention / expiration where a row could partially age
out due to the TTL of the column family and the times any particular columns
are written, but I think that is already a strength in HBase and should be
preserved in Phoenix. Behavior could be configurable in Phoenix.
It seems to me a pseudocolumn would definitely work and I can also use cases
for a function returning the time of a column, as Dominik suggested, even
without a a range query. A function could also set the timestamps for inserts
and updates at a cell level.
For range queries; eg between t1 and t2, I would want what the row looked like
at t1 (whether or not there were update(s) at t1), and a row for each timestamp
where there where changes between t1 and t2.
The equivalent in Oracle (Flashback query), the syntax in queries is select ...
versions between timestamp t1 and t2, or for a single point in time select ...
as of timestamp t. Their pseudocolumn is ora_rowscn, which can be converted to
a timestamp via another function. Can also do as of scn and versions between
scn1 and scn2.
For phoenix, an epoch time in millis maps well to the scn concept. Support in
Phoenix SQL for either timestamps or longs similar to how Oracle does it makes
sense. In the extreme, HBase doesn't require the timestamps to really be times
and a scheme more like Oracle's scn could be put in place. Benefit would be to
get sub-millisecond precision in the timestamps or could aid in transaction
support.
> Native HBase timestamp support to optimize date range queries in Phoenix
> -------------------------------------------------------------------------
>
> Key: PHOENIX-914
> URL: https://issues.apache.org/jira/browse/PHOENIX-914
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.0.0
> Reporter: Vladimir Rodionov
> Assignee: Jeffrey Zhong
> Attachments: PHOENIX-914.patch
>
>
> For many applications one of the column of a table can be (and must be)
> naturally mapped
> to HBase timestamp. What it gives us is the optimization on StoreScanner
> where HFiles with timestamps out of range of
> a Scan operator will be omitted. Let us say that we have time-series type of
> data (EVENTS) and custom compaction, where we create
> series of HFiles with continuous non-overlapping timestamp ranges.
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
> METRICID VARCHAR NOT NULL,
> METRICNAME VARCHAR,
> SERVICENAME VARCHAR NOT NULL,
> ORIGIN VARCHAR NOT NULL,
> APPID VARCHAR,
> IPID VARCHAR,
> NVALUE DOUBLE,
> TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/,
> DATA VARCHAR,
> SVALUE VARCHAR
> CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID,
> TIME)
> ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
> Make note on TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/ - this is the Hint to
> Phoenix that the column
> TIME must be mapped to HBase timestamp.
> The Query:
> Select all events of type 'X' for last 7 days
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() -
> 7*24*3600000; (this may be not correct SQL syntax of course)
> These types of queries will be efficiently optimized if:
> 1. Phoenix maps TIME column to HBase timestamp
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan
> timerange
> Although this :
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
> will work in my case- it may not be efficient from performance point of view
> because for every INSERT/UPSERT
> new Connection object and new Statement is created, beside this we still need
> the optimization 2. (see above).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)