[
https://issues.apache.org/jira/browse/PHOENIX-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839329#comment-16839329
]
Julian commented on PHOENIX-967:
--------------------------------
I have a business use case for which I wish to perform a query to find only the
latest event status rows over a given timeframe, but for which the traditional
time series approach taken in Phoenix-914 is not a good fit. I find the
solution for Phoenix-914 is not at a generic approach to using hbase
timestamps, but rather focuses on a given problem. I'm looking rather looking
for a way to use the general hbase features via sql.
The Use Case is as follows -
1) We receive events on basis of an incident number, which is the primary row
key.
2) We keep updating the row as we receive more events for that incident with
the latest status information. Thus it is important we maintain a single row
with the current status which also contains the incident created time, last
updated time etc, whether the it is still open or closed etc. Versions may also
be interesting in future to see the history of the incident, but right now we
are only interested in most recent version.
3) Some of the End User queries need to find any incidents that have been
updated in the last 7 days, thus are time based, however they also need to be
able to query a specific incident for example by it's incident number without
triggering long scans. Note, two years of incidents are in the table, the
majority of which will not have been updated since being closed, so querying on
last 7 days is a limited data set of the full data.
In the current phoenix implementation, there seems no way for me to query the
latest version of rows over a time range in Phoenix without using PHOENIX-914
feature, but I can't due to the primary key requirements (we don't want time in
the key). Rather I am looking simply for a way to specify the TIMERANGE, in a
similar way as I can when performing an HBASE timerange scan directly on Hbase.
and get only the latest version in my case.
In addition to the above, I am also not seemingly able to set the row / column
timestamps to the event timestamp in the way I want to also, as seems this is
only possibly by making a connection and setting a connection level property
but that would need to be for every event (not handy) or use the PHOENIX-914
features but that means putting the timestamp into the primary key which I do
not want. So nothing is fitting here well.
It's a shame, as the Use Case is not that complex, hbase wise but also
potentially phoenix wise. Ultimately I am unable to use Phoenix for this use
case and we have to look at other solutions.
Note, secondary indexing we are testing, however this is not delivering the
performance results we need whilst the time range scan via hbase api is working
perfectly. This however is not End User friendly and we were hoping to enable
SQL via Hive on Phoenix on Hbase for this use case for some end user queries.
> Consider (some of) SQL 2011 temporal support
> --------------------------------------------
>
> Key: PHOENIX-967
> URL: https://issues.apache.org/jira/browse/PHOENIX-967
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Andrew Purtell
> Priority: Major
>
> Language extensions for temporal data were added in SQL 2011 (*1), providing
> among other things the ability to define period definitions as metadata to
> table, system-versioned tables (CREATE TABLE ... WITH SYSTEM VERSIONING), the
> ability to query by system version time, and syntax for conditions involving
> periods: CONTAINS, OVERLAPS, EQUALS, PRECEDES, SUCCEEDS, IMMEDIATELY
> PRECEDES, and IMMEDIATELY SUCCEEDS.
> While not proposing support for all of SQL 2011 temporal features, it seems
> that system versioning, the new temporal comparison operators, and the new
> SELECT syntax for versioned queries map pretty naturally to multiversoned
> HBase table schema and timerange filtering. And at least one commercial
> database supports temporal queries using an older nonstandard syntax (*2).
> E.g. from *1:\\
> - {{SELECT ... FOR SYSTEM_TIME AS OF TIMESTAMP <timestamp>...}}
> - {{SELECT ... FOR SYSTEM_TIME BETWEEN TIMESTAMP <timestamp-start> AND
> TIMESTAMP <timestamp-end> ...}}
> I didn't pay to download the 2011 standard doc so can't say if TIMESTAMP is
> required or if we can stand in any type that can be serialized to a long
> integer.
> And from *2: \\
> - {{SELECT ... AS OF TIMESTAMP <timestamp> ...}}
> - {{SELECT ... VERSIONS BETWEEN TIMESTAMP <timestamp-start> AND
> <timestamp-end> ...}}
> *1:
> http://www.sigmod.org/publications/sigmod-record/0612/publications/1209/pdfs/07.industry.kulkarni.pdf
> *2:
> http://www.oracle.com/technetwork/database/storage/total-recall-whitepaper-171749.pdf
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)