Hi Phoenix, I use Phoenix for 1 year and HBase since 2 years. and I really think phoenix leverage Hbase.. But I'm still surprised that the column oriented storage isn't totally used. The dynamic column feature allow you to upsert or select a column not created in the create table statement, but you cannot create a block of variables columns. Why haven't you invented a feature like this ?
CREATE TABLE ( eventTime TIMESTAMP NUT NULL, iotID INTEGER NOT NULL, consumption BIGINT, maxConsumption BIGINT CONSTRAINT pk PRIMARY KEY (eventTime day_qualifier_column, iotID)) SALT_BUCKETS = 20; This phoenix would create an HBASE table with 1 row per iotID and day, and all other column in block in the same row, with suffixe for the time since the beginning of the day (in the optimize way => like opentsdb). In the same way, hour_qualifier_column would create 1 row per iotID and hour, and all other column in block in the same row, with suffixe for the time since the beginning of the hour. Of course, if i have, for example, on a row 20 column blocks (20 x comsumption and maxComsumption), the sql select statement will return 20 lines (like the lateral view explode in HiveQL) it would be something like openTSDB model : http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html#data-table-schema<http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html%23data-table-schema> Maybe this would be optimize with data block encoding like fast_diff because only suffixe changes for a range of column qualifier on Hbase, and i think that native agregation coprocessor will work on it. (I think the future immutable data packing feature will not use the native coprocessor agreggation.) I think this would improve the performance on SQL analysis (like OLAP on time series) This is for time series use cases. You could say that I could use modelisation in row and not in column but if i use salt_bucket, fast_diff encoding won't be optimal (because of the datetime in the rowkey). You could say that I could use row timestamp, but for huge time series (not a lot of rowkey and a lot of version) ? => We have tried in line vs in column (static declaration in phoenix, 48 time slots) and for significant agreggation in column was 2X better. Thank you by advance for your answer. Best regards, Nicolas DOUSSINET
