One more suggestion, using "align by device" is more clear than "group by device".
----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Xiangdong Huang <saint...@gmail.com> 于2020年2月7日周五 下午2:56写道: > -1 for (2), forever and I think I will never vote +1 for it... > > If you do it like that, there is no chance to replace those applications > which are using relational db to manage timeseries data. > > (3) is the most friendly for those developers who are using Relational DB, > because when they write a SQL like "select c1, c2, c3 FROM", they think it > is of course that the resultset has 3 columns... > > Of course, for users who are using RDB and want a table like "Time > DeviceId, s1, s2", their applications can guarantee the data type of data > in s2 as const. > If there are many data types in s2, the RDB users may use "text" > "varchar2" format directly. > > Considering that, I think the choice is: if all data has the same data > type in a column, use the correct data type. Otherwise use String. > > (1) Well, it can be an option. But my suggestion is, if all data has the > same data type in a column, do not change its column name. > > Best, > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > Jialin Qiao <qiaojia...@apache.org> 于2020年2月7日周五 下午2:29写道: > >> Hi, >> >> In IOTDB-243 [1], We want to allow create measurements with the same name >> but with different types in the same storage group. >> >> For example, >> root.sg1.d1.s1, int32 >> root.sg1.d1.s2 int32 >> root.sg1.d2.s1 boolean >> root.sg1.d2.s2 int32 >> >> This may cause trouble in group by device query. How do we organize the >> result (table schema)? I thought of three ways: >> >> (1) Time, Device, s1_int, s1_boolean, s2_int32 >> >> * advantage: >> - No ambiguity >> - The number of columns is acceptable. >> >> * disadvantage: >> - In most cases, the datatype indicator is redundant and weird. >> - Difficult to use parallelization among devices in the query. >> >> (2) Time, d1, s1, s2 Time, d2, s1, s2 >> >> * advantage: >> - No ambiguity >> - This could leverage the parallelization among devices in the query. >> >> * disadvantage: >> - The number of columns may be large. >> >> (3) Time DeviceId, s1, s2 >> >> This may need to do much work in the QueryDataSet, and users need to get >> value carefully according to the measurement type of one device. >> Otherwise, >> it may cause RunTimeException in JDBC Client. >> >> * advantage: >> - The number of columns is the minimal. >> >> * disadvantage: >> - May cause ambiguity, a column of one table has more than one type, which >> also conflicts to the Spark connector or Hive connector. >> - Difficult to use parallelization in the query. >> >> _______________ >> >> From my perspective, I prefer (1) ≈ (2) > (3). >> >> What's your opinion? >> >> [1] https://issues.apache.org/jira/browse/IOTDB-243 >> >> Thanks, >> ————————————————— >> Jialin Qiao >> School of Software, Tsinghua University >> >> 乔嘉林 >> 清华大学 软件学院 >> >