Vladimir Rodionov created PHOENIX-914:
-----------------------------------------

             Summary: Native HBase timestamp support to optimize date range 
queries in Phoenix 
                 Key: PHOENIX-914
                 URL: https://issues.apache.org/jira/browse/PHOENIX-914
             Project: Phoenix
          Issue Type: Improvement
    Affects Versions: 4.0.0
            Reporter: Vladimir Rodionov


For many applications one of the column of a table can be (and must be) 
naturally mapped 

to HBase timestamp. What it gives us is the optimization on StoreScanner where 
HFiles with timestamps out of range of

a Scan operator will be omitted. Let us say that we have time-series type of 
data (EVENTS) and custom compaction, where we create 
series of HFiles with continuous non-overlapping timestamp ranges.

CREATE TABLE IF NOT EXISTS ODS.EVENTS (
    METRICID  VARCHAR NOT NULL,
    METRICNAME VARCHAR,
    SERVICENAME VARCHAR NOT NULL,
    ORIGIN VARCHAR NOT NULL,
    APPID VARCHAR,
    IPID VARCHAR,
    NVALUE DOUBLE,
    TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/,
    DATA VARCHAR,
    SVALUE VARCHAR
    CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID, TIME)
) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';

Make note on   TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/ - this is the Hint to 
Phoenix that the column

TIME must be mapped to HBase timestamp. 

The Query:

Select all events of type 'X' for last 7 days

SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() - 
7*24*3600000; (this may be not correct SQL syntax of course)

These types of queries will be efficiently optimized if:


1. Phoenix maps  TIME column to HBase timestamp

2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan timerange 


Although this :

Properties props = new Properties();
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
Connection conn = DriverManager.connect(myUrl, props);

conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
conn.commit();

will work in my case- it may not be efficient from performance point of view 
because for every INSERT/UPSERT 
new Connection object and new Statement is created, beside this we still need 
the optimization 2. (see above). 




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to