[
https://issues.apache.org/jira/browse/PHOENIX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152678#comment-14152678
]
Jeffrey Zhong commented on PHOENIX-914:
---------------------------------------
{quote}
How will the user know the relationship between the timestamp in the row key
and the cell timestamp(s)? Is it "close enough" that it wouldn't matter if the
timestamp in the row key matched the cell timestamp? Otherwise, how will the
user know how to specify the cell timestamp range to query against?
{quote}
That's the problem if we duplicate the timestamp value for the "real" timestamp
column with native hbase timestamp value. If we choose the JIRA proposal:
"real" column way, then the table with the special column might not work for
future transaction as limitations.
Using pseudo column way, a user could provide a "conservative" timestamp value
for pre filtering if they want the optimization.
> Native HBase timestamp support to optimize date range queries in Phoenix
> -------------------------------------------------------------------------
>
> Key: PHOENIX-914
> URL: https://issues.apache.org/jira/browse/PHOENIX-914
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.0.0
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
>
> For many applications one of the column of a table can be (and must be)
> naturally mapped
> to HBase timestamp. What it gives us is the optimization on StoreScanner
> where HFiles with timestamps out of range of
> a Scan operator will be omitted. Let us say that we have time-series type of
> data (EVENTS) and custom compaction, where we create
> series of HFiles with continuous non-overlapping timestamp ranges.
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
> METRICID VARCHAR NOT NULL,
> METRICNAME VARCHAR,
> SERVICENAME VARCHAR NOT NULL,
> ORIGIN VARCHAR NOT NULL,
> APPID VARCHAR,
> IPID VARCHAR,
> NVALUE DOUBLE,
> TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/,
> DATA VARCHAR,
> SVALUE VARCHAR
> CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID,
> TIME)
> ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
> Make note on TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/ - this is the Hint to
> Phoenix that the column
> TIME must be mapped to HBase timestamp.
> The Query:
> Select all events of type 'X' for last 7 days
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() -
> 7*24*3600000; (this may be not correct SQL syntax of course)
> These types of queries will be efficiently optimized if:
> 1. Phoenix maps TIME column to HBase timestamp
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan
> timerange
> Although this :
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
> will work in my case- it may not be efficient from performance point of view
> because for every INSERT/UPSERT
> new Connection object and new Statement is created, beside this we still need
> the optimization 2. (see above).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)