Nicolas, FYI, the description provided by Thomas is how we support Argus[1] on top of Phoenix. This Phoenix-based time series implementation will be open sourced soon in Argus. In our performance measurements[2], we've found Phoenix to be on par or faster than OpenTSDB with approximately the same amount of data stored on disk. Having a wide versus narrow table doesn't make a lot of difference given the FastDiff and compression done by HBase. On top of this, the new tiered compaction strategy available in HBase[3] is very promising too.
Thanks, James [1] https://github.com/salesforce/Argus [2] http://www.slideshare.net/HBaseCon/apache-phoenix-use-cases-and-new-features [3] https://www.youtube.com/watch?v=IeGbDlFmnSg&list=PLe-h9HrA9qfDVOeNh1l_T5HvwvkO9raWy&index=15 On Tue, Jan 10, 2017 at 10:06 AM, Thomas D'Silva <[email protected]> wrote: > Nicolas, > > If you want to implement the OpenTSDB datamodel you can define a base table > and define views for each metric. For example the base table DDL could be > > CREATE SEQUENCE metric_id_seq CACHE 100 > > CREATE TABLE metric_table > ( > metricid INTEGER NOT NULL, > eventTime TIMESTAMP NOT NULL, > CONSTRAINT pk PRIMARY KEY(metricid, timestamp) > ) > > APPEND_ONLY_SCHEMA = true, > UPDATE_CACHE_FREQUENCY=30000,AUTO_PARTITION_SEQ=metric_id_seq > > CREATE VIEW IF not EXISTS metric1 > ( > tag1 VARCHAR NOT NULL, > val1 DOUBLE, CONSTRAINT pk PRIMARY KEY(tag1) ) > > AS SELECT * FROM metric_table > > The APPEND_ONLY_SCHEMA attribute means columns can only be added and > never removed from the table. This allows Phoenix to save an rpc if > the metadata of the metric1 view matches that of the create table ddl > statement. > > The AUTO_PARTITION_SEQ populates the metricid column automatically > based on the value of the sequence metric_id_seq. > > If you want to add a tag/value to metric1 execute the following ddl > and the new tag2 and val2 columns will be added to the view. > > CREATE VIEW IF not EXISTS metric1 > ( > tag1 VARCHAR NOT NULL, tag2 VARCHAR NOT NULL, > val1 DOUBLE, val2 DOUBLE, CONSTRAINT pk PRIMARY KEY(tag1,tag2) ) > > AS SELECT * FROM metric_table > > The AUTO_PARTITION_SEQ and APPEND_ONLY_SCHEMA attributes were > introduced in Phoenix 4.8. > > Thanks, > > Thomas > > > On Mon, Jan 9, 2017 at 2:28 PM, Nicolas DOUSSINET <[email protected]> > wrote: > > > Hi Phoenix, > > > > I use Phoenix for 1 year and HBase since 2 years. and I really think > > phoenix leverage Hbase.. But I'm still surprised that the column oriented > > storage isn't totally used. The dynamic column feature allow you to > upsert > > or select a column not created in the create table statement, but you > > cannot create a block of variables columns. Why haven't you invented a > > feature like this ? > > > > CREATE TABLE ( > > eventTime TIMESTAMP NUT NULL, > > iotID INTEGER NOT NULL, > > consumption BIGINT, > > maxConsumption BIGINT > > CONSTRAINT pk PRIMARY KEY (eventTime day_qualifier_column, iotID)) > > SALT_BUCKETS = 20; > > > > This phoenix would create an HBASE table with 1 row per iotID and day, > and > > all other column in block in the same row, with suffixe for the time > since > > the beginning of the day (in the optimize way => like opentsdb). In the > > same way, hour_qualifier_column would create 1 row per iotID and hour, > and > > all other column in block in the same row, with suffixe for the time > since > > the beginning of the hour. > > > > Of course, if i have, for example, on a row 20 column blocks (20 x > > comsumption and maxComsumption), the sql select statement will return 20 > > lines (like the lateral view explode in HiveQL) > > > > it would be something like openTSDB model : http://opentsdb.net/docs/ > > build/html/user_guide/backends/hbase.html#data-table-schema< > > http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html% > > 23data-table-schema> > > > > Maybe this would be optimize with data block encoding like fast_diff > > because only suffixe changes for a range of column qualifier on Hbase, > and > > i think that native agregation coprocessor will work on it. (I think the > > future immutable data packing feature will not use the native coprocessor > > agreggation.) > > > > I think this would improve the performance on SQL analysis (like OLAP on > > time series) > > > > This is for time series use cases. > > > > You could say that I could use modelisation in row and not in column but > > if i use salt_bucket, fast_diff encoding won't be optimal (because of the > > datetime in the rowkey). > > You could say that I could use row timestamp, but for huge time series > > (not a lot of rowkey and a lot of version) ? > > > > => We have tried in line vs in column (static declaration in phoenix, 48 > > time slots) and for significant agreggation in column was 2X better. > > > > Thank you by advance for your answer. > > > > Best regards, > > > > Nicolas DOUSSINET > > >
