Re: Optimisation for times series

James Taylor Tue, 10 Jan 2017 10:59:45 -0800

Nicolas,
FYI, the description provided by Thomas is how we support Argus[1] on top
of Phoenix. This Phoenix-based time series implementation will be open
sourced soon in Argus. In our performance measurements[2], we've found
Phoenix to be on par or faster than OpenTSDB with approximately the same
amount of data stored on disk. Having a wide versus narrow table doesn't
make a lot of difference given the FastDiff and compression done by HBase.
On top of this, the new tiered compaction strategy available in HBase[3] is
very promising too.


Thanks,
James

[1] https://github.com/salesforce/Argus
[2]
http://www.slideshare.net/HBaseCon/apache-phoenix-use-cases-and-new-features
[3]
https://www.youtube.com/watch?v=IeGbDlFmnSg&list=PLe-h9HrA9qfDVOeNh1l_T5HvwvkO9raWy&index=15

On Tue, Jan 10, 2017 at 10:06 AM, Thomas D'Silva <[email protected]>
wrote:

> Nicolas,
>
> If you want to implement the OpenTSDB datamodel you can define a base table
> and define views for each metric. For example the base table DDL could be
>
> CREATE SEQUENCE metric_id_seq CACHE 100
>
> CREATE TABLE metric_table
>   (
>      metricid  INTEGER NOT NULL,
>      eventTime  TIMESTAMP NOT NULL,
>      CONSTRAINT pk PRIMARY KEY(metricid, timestamp)
>   )
>
> APPEND_ONLY_SCHEMA = true,
> UPDATE_CACHE_FREQUENCY=30000,AUTO_PARTITION_SEQ=metric_id_seq
>
> CREATE VIEW IF not EXISTS metric1
>   (
>       tag1 VARCHAR NOT NULL,
>       val1 DOUBLE, CONSTRAINT pk PRIMARY KEY(tag1)   )
>
> AS SELECT * FROM metric_table
>
> The APPEND_ONLY_SCHEMA attribute means columns can only be added and
> never removed from the table. This allows Phoenix to save an rpc if
> the metadata of the metric1 view matches that of the create table ddl
> statement.
>
> The AUTO_PARTITION_SEQ populates the metricid column automatically
> based on the value of the sequence metric_id_seq.
>
> If you want to add a tag/value to metric1 execute the following ddl
> and the new tag2 and val2 columns will be added to the view.
>
> CREATE VIEW IF not EXISTS metric1
>   (
>       tag1 VARCHAR NOT NULL, tag2 VARCHAR NOT NULL,
>       val1 DOUBLE, val2 DOUBLE, CONSTRAINT pk PRIMARY KEY(tag1,tag2)   )
>
> AS SELECT * FROM metric_table
>
> The AUTO_PARTITION_SEQ and APPEND_ONLY_SCHEMA attributes were
> introduced in Phoenix 4.8.
>
> Thanks,
>
> Thomas
>
>
> On Mon, Jan 9, 2017 at 2:28 PM, Nicolas DOUSSINET <[email protected]>
> wrote:
>
> > Hi Phoenix,
> >
> > I use Phoenix for 1 year and HBase since 2 years.  and I really think
> > phoenix leverage Hbase.. But I'm still surprised that the column oriented
> > storage isn't totally used. The dynamic column feature allow you to
> upsert
> > or select a column not created in the create table statement, but you
> > cannot create a block of variables columns. Why haven't you invented a
> > feature like this ?
> >
> > CREATE TABLE (
> > eventTime  TIMESTAMP NUT NULL,
> > iotID INTEGER NOT NULL,
> > consumption BIGINT,
> > maxConsumption BIGINT
> > CONSTRAINT pk PRIMARY KEY (eventTime day_qualifier_column, iotID))
> > SALT_BUCKETS = 20;
> >
> > This phoenix would create an HBASE table with 1 row per iotID and day,
> and
> > all other column in block  in the same row, with suffixe for the time
> since
> > the beginning of the day (in the optimize way => like opentsdb). In the
> > same way, hour_qualifier_column would create 1 row per iotID and hour,
> and
> > all other column in block  in the same row, with suffixe for the time
> since
> > the beginning of the hour.
> >
> > Of course, if i have, for example, on a row 20 column blocks (20 x
> > comsumption and maxComsumption), the sql select statement will return 20
> > lines (like the lateral view explode in HiveQL)
> >
> > it would be something like openTSDB model : http://opentsdb.net/docs/
> > build/html/user_guide/backends/hbase.html#data-table-schema<
> > http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html%
> > 23data-table-schema>
> >
> > Maybe this would be optimize with data block encoding like fast_diff
> > because only suffixe changes for a range of column qualifier on Hbase,
> and
> > i think that native agregation coprocessor will work on it. (I think the
> > future immutable data packing feature will not use the native coprocessor
> > agreggation.)
> >
> > I think this would improve the performance on SQL analysis (like OLAP on
> > time series)
> >
> > This is for time series use cases.
> >
> > You could say that I could use modelisation in row and not in column but
> > if i use salt_bucket, fast_diff encoding won't be optimal (because of the
> > datetime in the rowkey).
> > You could say that I could use row timestamp, but for huge time series
> > (not a lot of rowkey and a lot of version) ?
> >
> > => We have tried in line vs in column (static declaration in phoenix, 48
> > time slots) and for significant agreggation in column was 2X better.
> >
> > Thank you by advance for your answer.
> >
> > Best regards,
> >
> > Nicolas DOUSSINET
> >
>

Re: Optimisation for times series

Reply via email to