-1 for (2), forever and  I think I will never vote +1 for it...

If you do it like that, there is no chance to replace those applications
which are using relational db to manage timeseries data.

(3) is the most friendly for those developers who are using Relational DB,
because when they write a SQL like "select c1, c2, c3 FROM", they think it
is of course that the resultset has 3 columns...

Of course, for users who are using RDB and want a table like "Time
DeviceId, s1, s2", their applications can guarantee the data type of data
in s2 as const.
If there are many data types in s2, the RDB users may use "text" "varchar2"
format directly.

Considering that, I think the choice is: if all data has the same data type
in a column, use the correct data type. Otherwise use String.

(1) Well, it can be an option. But my suggestion is, if all data has the
same data type in a column, do not change its column name.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Jialin Qiao <qiaojia...@apache.org> 于2020年2月7日周五 下午2:29写道:

> Hi,
>
> In IOTDB-243 [1], We want to allow create measurements with the same name
> but with different types in the same storage group.
>
> For example,
> root.sg1.d1.s1, int32
> root.sg1.d1.s2 int32
> root.sg1.d2.s1 boolean
> root.sg1.d2.s2 int32
>
> This may cause trouble in group by device query. How do we organize the
> result (table schema)? I thought of three ways:
>
> (1) Time, Device, s1_int, s1_boolean, s2_int32
>
> * advantage:
> - No ambiguity
> - The number of columns is acceptable.
>
> * disadvantage:
> - In most cases, the datatype indicator is redundant and weird.
> - Difficult to use parallelization among devices in the query.
>
> (2) Time, d1, s1, s2 Time, d2, s1, s2
>
> * advantage:
> - No ambiguity
> - This could leverage the parallelization among devices in the query.
>
> * disadvantage:
> - The number of columns may be large.
>
> (3) Time DeviceId, s1, s2
>
> This may need to do much work in the QueryDataSet, and users need to get
> value carefully according to the measurement type of one device. Otherwise,
> it may cause RunTimeException in JDBC Client.
>
> * advantage:
> - The number of columns is the minimal.
>
> * disadvantage:
> - May cause ambiguity, a column of one table has more than one type, which
> also conflicts to the Spark connector or Hive connector.
> - Difficult to use parallelization in the query.
>
> _______________
>
> From my perspective, I prefer (1) ≈ (2) > (3).
>
> What's your opinion?
>
> [1] https://issues.apache.org/jira/browse/IOTDB-243
>
> Thanks,
> —————————————————
> Jialin Qiao
> School of Software, Tsinghua University
>
> 乔嘉林
> 清华大学 软件学院
>

Reply via email to