Re: Optimisation for times series

Thomas D'Silva Tue, 10 Jan 2017 10:06:50 -0800

Nicolas,

If you want to implement the OpenTSDB datamodel you can define a base table
and define views for each metric. For example the base table DDL could be


CREATE SEQUENCE metric_id_seq CACHE 100

CREATE TABLE metric_table
  (
     metricid  INTEGER NOT NULL,
     eventTime  TIMESTAMP NOT NULL,
     CONSTRAINT pk PRIMARY KEY(metricid, timestamp)
  )

APPEND_ONLY_SCHEMA = true,
UPDATE_CACHE_FREQUENCY=30000,AUTO_PARTITION_SEQ=metric_id_seq

CREATE VIEW IF not EXISTS metric1
  (
      tag1 VARCHAR NOT NULL,
      val1 DOUBLE, CONSTRAINT pk PRIMARY KEY(tag1)   )

AS SELECT * FROM metric_table

The APPEND_ONLY_SCHEMA attribute means columns can only be added and
never removed from the table. This allows Phoenix to save an rpc if
the metadata of the metric1 view matches that of the create table ddl
statement.

The AUTO_PARTITION_SEQ populates the metricid column automatically
based on the value of the sequence metric_id_seq.

If you want to add a tag/value to metric1 execute the following ddl
and the new tag2 and val2 columns will be added to the view.

CREATE VIEW IF not EXISTS metric1
  (
      tag1 VARCHAR NOT NULL, tag2 VARCHAR NOT NULL,
      val1 DOUBLE, val2 DOUBLE, CONSTRAINT pk PRIMARY KEY(tag1,tag2)   )

AS SELECT * FROM metric_table

The AUTO_PARTITION_SEQ and APPEND_ONLY_SCHEMA attributes were
introduced in Phoenix 4.8.

Thanks,

Thomas


On Mon, Jan 9, 2017 at 2:28 PM, Nicolas DOUSSINET <[email protected]>
wrote:

> Hi Phoenix,
>
> I use Phoenix for 1 year and HBase since 2 years.  and I really think
> phoenix leverage Hbase.. But I'm still surprised that the column oriented
> storage isn't totally used. The dynamic column feature allow you to upsert
> or select a column not created in the create table statement, but you
> cannot create a block of variables columns. Why haven't you invented a
> feature like this ?
>
> CREATE TABLE (
> eventTime  TIMESTAMP NUT NULL,
> iotID INTEGER NOT NULL,
> consumption BIGINT,
> maxConsumption BIGINT
> CONSTRAINT pk PRIMARY KEY (eventTime day_qualifier_column, iotID))
> SALT_BUCKETS = 20;
>
> This phoenix would create an HBASE table with 1 row per iotID and day, and
> all other column in block  in the same row, with suffixe for the time since
> the beginning of the day (in the optimize way => like opentsdb). In the
> same way, hour_qualifier_column would create 1 row per iotID and hour, and
> all other column in block  in the same row, with suffixe for the time since
> the beginning of the hour.
>
> Of course, if i have, for example, on a row 20 column blocks (20 x
> comsumption and maxComsumption), the sql select statement will return 20
> lines (like the lateral view explode in HiveQL)
>
> it would be something like openTSDB model : http://opentsdb.net/docs/
> build/html/user_guide/backends/hbase.html#data-table-schema<
> http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html%
> 23data-table-schema>
>
> Maybe this would be optimize with data block encoding like fast_diff
> because only suffixe changes for a range of column qualifier on Hbase, and
> i think that native agregation coprocessor will work on it. (I think the
> future immutable data packing feature will not use the native coprocessor
> agreggation.)
>
> I think this would improve the performance on SQL analysis (like OLAP on
> time series)
>
> This is for time series use cases.
>
> You could say that I could use modelisation in row and not in column but
> if i use salt_bucket, fast_diff encoding won't be optimal (because of the
> datetime in the rowkey).
> You could say that I could use row timestamp, but for huge time series
> (not a lot of rowkey and a lot of version) ?
>
> => We have tried in line vs in column (static declaration in phoenix, 48
> time slots) and for significant agreggation in column was 2X better.
>
> Thank you by advance for your answer.
>
> Best regards,
>
> Nicolas DOUSSINET
>

Re: Optimisation for times series

Reply via email to