Vladimir Rodionov created PHOENIX-914:
-----------------------------------------
Summary: Native HBase timestamp support to optimize date range
queries in Phoenix
Key: PHOENIX-914
URL: https://issues.apache.org/jira/browse/PHOENIX-914
Project: Phoenix
Issue Type: Improvement
Affects Versions: 4.0.0
Reporter: Vladimir Rodionov
For many applications one of the column of a table can be (and must be)
naturally mapped
to HBase timestamp. What it gives us is the optimization on StoreScanner where
HFiles with timestamps out of range of
a Scan operator will be omitted. Let us say that we have time-series type of
data (EVENTS) and custom compaction, where we create
series of HFiles with continuous non-overlapping timestamp ranges.
CREATE TABLE IF NOT EXISTS ODS.EVENTS (
METRICID VARCHAR NOT NULL,
METRICNAME VARCHAR,
SERVICENAME VARCHAR NOT NULL,
ORIGIN VARCHAR NOT NULL,
APPID VARCHAR,
IPID VARCHAR,
NVALUE DOUBLE,
TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/,
DATA VARCHAR,
SVALUE VARCHAR
CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID, TIME)
) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
Make note on TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/ - this is the Hint to
Phoenix that the column
TIME must be mapped to HBase timestamp.
The Query:
Select all events of type 'X' for last 7 days
SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() -
7*24*3600000; (this may be not correct SQL syntax of course)
These types of queries will be efficiently optimized if:
1. Phoenix maps TIME column to HBase timestamp
2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan timerange
Although this :
Properties props = new Properties();
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
Connection conn = DriverManager.connect(myUrl, props);
conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
conn.commit();
will work in my case- it may not be efficient from performance point of view
because for every INSERT/UPSERT
new Connection object and new Statement is created, beside this we still need
the optimization 2. (see above).
--
This message was sent by Atlassian JIRA
(v6.2#6252)