[jira] [Commented] (PHOENIX-914) Native HBase timestamp support to optimize date range queries in Phoenix

Jeffrey Zhong (JIRA) Thu, 02 Oct 2014 10:40:57 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156830#comment-14156830
 ]


Jeffrey Zhong commented on PHOENIX-914:
---------------------------------------

{quote}
It'd also be awkward, as you'd have both a filter for your timestamp row key 
column and another one for the pseudo column
{quote}
That's the down side of this approach and less intuitive for the users. But 
this also express the fact we do exist two time stamps where one is application 
data timestamp and the other is hbase "version" timestamp. The good side of 
pseduo column way is less intrusive to internals, more for advance users and 
just expose a possible optimization way for end users. After the above long 
discussion, I think we can go either way or both depends on particular use 
cases. 

{quote}
tell me more about the cases where the timestamp would be different
{quote}
The use case is general. When people send metrics data to Phoenix, the metrics 
data are normally cached/batched for a while(say a min) at client side or 
aggregated at client side for a bit. More specific is in AMBARI-5707 where 
Phoenix & Hbase is used as backend storage system for Ambari new metric system.




> Native HBase timestamp support to optimize date range queries in Phoenix 
> -------------------------------------------------------------------------
>
>                 Key: PHOENIX-914
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-914
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>
> For many applications one of the column of a table can be (and must be) 
> naturally mapped 
> to HBase timestamp. What it gives us is the optimization on StoreScanner 
> where HFiles with timestamps out of range of
> a Scan operator will be omitted. Let us say that we have time-series type of 
> data (EVENTS) and custom compaction, where we create 
> series of HFiles with continuous non-overlapping timestamp ranges.
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
>     METRICID  VARCHAR NOT NULL,
>     METRICNAME VARCHAR,
>     SERVICENAME VARCHAR NOT NULL,
>     ORIGIN VARCHAR NOT NULL,
>     APPID VARCHAR,
>     IPID VARCHAR,
>     NVALUE DOUBLE,
>     TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/,
>     DATA VARCHAR,
>     SVALUE VARCHAR
>     CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID, 
> TIME)
> ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
> Make note on   TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/ - this is the Hint to 
> Phoenix that the column
> TIME must be mapped to HBase timestamp. 
> The Query:
> Select all events of type 'X' for last 7 days
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() - 
> 7*24*3600000; (this may be not correct SQL syntax of course)
> These types of queries will be efficiently optimized if:
> 1. Phoenix maps  TIME column to HBase timestamp
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan 
> timerange 
> Although this :
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
> will work in my case- it may not be efficient from performance point of view 
> because for every INSERT/UPSERT 
> new Connection object and new Statement is created, beside this we still need 
> the optimization 2. (see above). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-914) Native HBase timestamp support to optimize date range queries in Phoenix

Reply via email to