Hi, I was going through the design document & need some clarification.
1) Can we support creating agg tables while creating main table ? 2) Need some help in understanding "Support rollup table for timeseries data", i) Do we have any provision to specify timezone of timestamp column ? ii) Assuming granularity is till hour level, below 4 agg table will be created *agg table 1 : *order_time, year, month, day, hour, country, sex, sum(quantity), max(quantity), count(user_id), sum(price), avg(price) group by order_time, year, month, day, hour, country, sex -- here year, month, day & hour are derived from order_time timestamp column *agg table 2 : *order_time, year, month, day, country, sex, sum(quantity), max(quantity), count(user_id), sum(price), avg(price) group by order_time, year, month, day, country, sex *agg table 3 :* order_time, year, month, country, sex, sum(quantity), max(quantity), count(user_id), sum(price), avg(price) group by order_time, year, month, country, sex *agg table 4 : *order_time, year, country, sex, sum(quantity), max(quantity), count(user_id), sum(price), avg(price) group by order_time, year, country, sex Please correct me if my understanding is wrong & provide more insight if possible. 3) All operations like load, Segment LCM, IUD on Agg tables should be restricted except select & compaction, right? 4) For new loads in parent table, we are creating new segments in agg table. Can we have segment to segment mapping with parent & agg table, so that operations like delete segments, update, delete will not be restricted in parent table. Or Can we have a table level flag whether to restrict delete segments, update, delete operations on parent table if agg tables exist 5) I assume filters also will be pushed to agg tables if all conditions match agg table columns eg., CREATE TABLE agg_sales TBLPROPERTIES (parent_table=”xx”) AS SELECT c1, c2, count(distinct c3) FROM source GROUP BY c1, c2 SELECT c1, c2, count(distinct c3) FROM source where c1 = 'a' and c2 = 'x' GROUP BY c1, c2 -- This will use agg table SELECT c1, c2, count(distinct c3) FROM source where c1 = 'a' and c2 = 'x' and c3 = 'y' GROUP BY c1, c2 -- This will not use agg table Whether below query use agg table ? SELECT c1, c2, count(distinct c3) as x FROM source GROUP BY c1, c2 having x > 10 Whether having clause will be converted to filter on count(distinct c3) column on agg table or whether spark handles having clause ? 6) Instead of user identifying & creating agg tables, can we try to use any ML or stats to understand queries/tables & suggest aggregates ? eg., based on column stats ---- Regards, Naresh P R On Fri, Oct 13, 2017 at 8:03 PM, Jacky Li <jacky.li...@qq.com> wrote: > Hi community, > > In traditional data warehouse, pre-aggregate table or cube is a common > technology to improve OLAP query performance. To take carbondata support > for OLAP to next level, I’d like to propose pre-aggregate table support in > carbondata. > > Please refer to CARBONDATA-1516 <https://issues.apache.org/ > jira/browse/CARBONDATA-1516> and the design document attached in the JIRA > ticket (https://issues.apache.org/jira/browse/CARBONDATA-1516 < > https://issues.apache.org/jira/browse/CARBONDATA-1516>) > > This design is still in initial phase, proposed usage and SQL syntax are > subject to change. Please provide your comment to improve this feature. > Any suggestion on the design from community is welcomed. > > Regards, > Jacky Li