[
https://issues.apache.org/jira/browse/PHOENIX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254971#comment-14254971
]
James Taylor commented on PHOENIX-914:
--------------------------------------
Thinking about this a bit more, and I believe we could make ROW_TIMESTAMP
read/write which would solve both use cases and prevent having to duplicate the
WHERE clause expressions. I think it may only make sense to allow this feature
for tables in which IMMUTABLE_ROWS=true - it just makes more sense IMO.
So you could set it like the from an UPSERT VALUES:
{code}
UPSERT INTO server_metrics(host, ROW_TIMESTAMP)
VALUES('SF-12', ?); -- for binding Cell timestamp to external value
UPSERT INTO server_metrics(host, ROW_TIMESTAMP)
VALUES('SF-12', CURRENT_TIME()); -- for binding Cell timestamp to HBase
server ts
{code}
In this case, on the implementation side, we'd use the value of the
ROW_TIMESTAMP pk column to set the Cell timestamp on the Put in PRowImpl
instead of using LATEST_TIMESTAMP.
We could also support setting the ROW_TIMESTAMP in an UPSERT SELECT, which
would be a bit harder, but not too bad:
{code}
UPSERT INTO server_metrics(host, ROW_TIMESTAMP, my_metric_value)
SELECT host, ROW_TIMESTAMP, my_other_metric_value
FROM other_metrics_table WHERE host LIKE 'NY%'
{code}
In this case, we'd again use the ROW_TIMESTAMP value in the PK of
other_metrics_table to set the Cell timestamp for the Puts into sever_metrics.
The final case would be where an UPSERT SELECT is done in which the source and
target table is the same. First pass, we could just prevent the optimization of
running this server-side for tables with a ROW_TIMESTAMP, and then we could
have a follow up JIRA which would push through the knowledge that each Cell
timestamp of the target table should be set to the appropriate value from the
source table.
Thoughts?
> Native HBase timestamp support to optimize date range queries in Phoenix
> -------------------------------------------------------------------------
>
> Key: PHOENIX-914
> URL: https://issues.apache.org/jira/browse/PHOENIX-914
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.0.0
> Reporter: Vladimir Rodionov
> Assignee: Jeffrey Zhong
> Attachments: PHOENIX-914.patch
>
>
> For many applications one of the column of a table can be (and must be)
> naturally mapped
> to HBase timestamp. What it gives us is the optimization on StoreScanner
> where HFiles with timestamps out of range of
> a Scan operator will be omitted. Let us say that we have time-series type of
> data (EVENTS) and custom compaction, where we create
> series of HFiles with continuous non-overlapping timestamp ranges.
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
> METRICID VARCHAR NOT NULL,
> METRICNAME VARCHAR,
> SERVICENAME VARCHAR NOT NULL,
> ORIGIN VARCHAR NOT NULL,
> APPID VARCHAR,
> IPID VARCHAR,
> NVALUE DOUBLE,
> TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/,
> DATA VARCHAR,
> SVALUE VARCHAR
> CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID,
> TIME)
> ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
> Make note on TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/ - this is the Hint to
> Phoenix that the column
> TIME must be mapped to HBase timestamp.
> The Query:
> Select all events of type 'X' for last 7 days
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() -
> 7*24*3600000; (this may be not correct SQL syntax of course)
> These types of queries will be efficiently optimized if:
> 1. Phoenix maps TIME column to HBase timestamp
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan
> timerange
> Although this :
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
> will work in my case- it may not be efficient from performance point of view
> because for every INSERT/UPSERT
> new Connection object and new Statement is created, beside this we still need
> the optimization 2. (see above).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)