[
https://issues.apache.org/jira/browse/PHOENIX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156162#comment-14156162
]
James Taylor commented on PHOENIX-914:
--------------------------------------
[~vrodionov] - you might need transactions if you want secondary indexes that
are guaranteed to always be consistent with the data table. If you're ok with
retrying your data writes until they succeed, or temporarily making an index
inactive in this scenario, then it's less important.
[~jeffreyz] - I agree that we need a good way to take advantage of this HBase
optimization. The pseudo column might work, but you'd have to restrict it's
usage in a query - maybe you'd only be able to use it in a range expression in
the WHERE clause. It'd also be awkward, as you'd have both a filter for your
timestamp row key column and another one for the pseudo column. If you can live
with Phoenix managing the timestamp row key, it's much cleaner - tell me more
about the cases where the timestamp would be different (and why it would not be
close enough).
I think this needs a design plus a use case and some examples.
> Native HBase timestamp support to optimize date range queries in Phoenix
> -------------------------------------------------------------------------
>
> Key: PHOENIX-914
> URL: https://issues.apache.org/jira/browse/PHOENIX-914
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.0.0
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
>
> For many applications one of the column of a table can be (and must be)
> naturally mapped
> to HBase timestamp. What it gives us is the optimization on StoreScanner
> where HFiles with timestamps out of range of
> a Scan operator will be omitted. Let us say that we have time-series type of
> data (EVENTS) and custom compaction, where we create
> series of HFiles with continuous non-overlapping timestamp ranges.
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
> METRICID VARCHAR NOT NULL,
> METRICNAME VARCHAR,
> SERVICENAME VARCHAR NOT NULL,
> ORIGIN VARCHAR NOT NULL,
> APPID VARCHAR,
> IPID VARCHAR,
> NVALUE DOUBLE,
> TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/,
> DATA VARCHAR,
> SVALUE VARCHAR
> CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID,
> TIME)
> ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
> Make note on TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/ - this is the Hint to
> Phoenix that the column
> TIME must be mapped to HBase timestamp.
> The Query:
> Select all events of type 'X' for last 7 days
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() -
> 7*24*3600000; (this may be not correct SQL syntax of course)
> These types of queries will be efficiently optimized if:
> 1. Phoenix maps TIME column to HBase timestamp
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan
> timerange
> Although this :
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
> will work in my case- it may not be efficient from performance point of view
> because for every INSERT/UPSERT
> new Connection object and new Statement is created, beside this we still need
> the optimization 2. (see above).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)