Nicolas,
If you want to implement the OpenTSDB datamodel you can define a base table
and define views for each metric. For example the base table DDL could be
CREATE SEQUENCE metric_id_seq CACHE 100
CREATE TABLE metric_table
(
metricid INTEGER NOT NULL,
eventTime TIMESTAMP NOT NULL,
CONSTRAINT pk PRIMARY KEY(metricid, timestamp)
)
APPEND_ONLY_SCHEMA = true,
UPDATE_CACHE_FREQUENCY=30000,AUTO_PARTITION_SEQ=metric_id_seq
CREATE VIEW IF not EXISTS metric1
(
tag1 VARCHAR NOT NULL,
val1 DOUBLE, CONSTRAINT pk PRIMARY KEY(tag1) )
AS SELECT * FROM metric_table
The APPEND_ONLY_SCHEMA attribute means columns can only be added and
never removed from the table. This allows Phoenix to save an rpc if
the metadata of the metric1 view matches that of the create table ddl
statement.
The AUTO_PARTITION_SEQ populates the metricid column automatically
based on the value of the sequence metric_id_seq.
If you want to add a tag/value to metric1 execute the following ddl
and the new tag2 and val2 columns will be added to the view.
CREATE VIEW IF not EXISTS metric1
(
tag1 VARCHAR NOT NULL, tag2 VARCHAR NOT NULL,
val1 DOUBLE, val2 DOUBLE, CONSTRAINT pk PRIMARY KEY(tag1,tag2) )
AS SELECT * FROM metric_table
The AUTO_PARTITION_SEQ and APPEND_ONLY_SCHEMA attributes were
introduced in Phoenix 4.8.
Thanks,
Thomas
On Mon, Jan 9, 2017 at 2:28 PM, Nicolas DOUSSINET <[email protected]>
wrote:
> Hi Phoenix,
>
> I use Phoenix for 1 year and HBase since 2 years. and I really think
> phoenix leverage Hbase.. But I'm still surprised that the column oriented
> storage isn't totally used. The dynamic column feature allow you to upsert
> or select a column not created in the create table statement, but you
> cannot create a block of variables columns. Why haven't you invented a
> feature like this ?
>
> CREATE TABLE (
> eventTime TIMESTAMP NUT NULL,
> iotID INTEGER NOT NULL,
> consumption BIGINT,
> maxConsumption BIGINT
> CONSTRAINT pk PRIMARY KEY (eventTime day_qualifier_column, iotID))
> SALT_BUCKETS = 20;
>
> This phoenix would create an HBASE table with 1 row per iotID and day, and
> all other column in block in the same row, with suffixe for the time since
> the beginning of the day (in the optimize way => like opentsdb). In the
> same way, hour_qualifier_column would create 1 row per iotID and hour, and
> all other column in block in the same row, with suffixe for the time since
> the beginning of the hour.
>
> Of course, if i have, for example, on a row 20 column blocks (20 x
> comsumption and maxComsumption), the sql select statement will return 20
> lines (like the lateral view explode in HiveQL)
>
> it would be something like openTSDB model : http://opentsdb.net/docs/
> build/html/user_guide/backends/hbase.html#data-table-schema<
> http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html%
> 23data-table-schema>
>
> Maybe this would be optimize with data block encoding like fast_diff
> because only suffixe changes for a range of column qualifier on Hbase, and
> i think that native agregation coprocessor will work on it. (I think the
> future immutable data packing feature will not use the native coprocessor
> agreggation.)
>
> I think this would improve the performance on SQL analysis (like OLAP on
> time series)
>
> This is for time series use cases.
>
> You could say that I could use modelisation in row and not in column but
> if i use salt_bucket, fast_diff encoding won't be optimal (because of the
> datetime in the rowkey).
> You could say that I could use row timestamp, but for huge time series
> (not a lot of rowkey and a lot of version) ?
>
> => We have tried in line vs in column (static declaration in phoenix, 48
> time slots) and for significant agreggation in column was 2X better.
>
> Thank you by advance for your answer.
>
> Best regards,
>
> Nicolas DOUSSINET
>