One more suggestion, using "align by device" is more clear than "group by
device".

-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Xiangdong Huang <saint...@gmail.com> 于2020年2月7日周五 下午2:56写道:

>  -1 for (2), forever and  I think I will never vote +1 for it...
>
> If you do it like that, there is no chance to replace those applications
> which are using relational db to manage timeseries data.
>
> (3) is the most friendly for those developers who are using Relational DB,
> because when they write a SQL like "select c1, c2, c3 FROM", they think it
> is of course that the resultset has 3 columns...
>
> Of course, for users who are using RDB and want a table like "Time
> DeviceId, s1, s2", their applications can guarantee the data type of data
> in s2 as const.
> If there are many data types in s2, the RDB users may use "text"
> "varchar2" format directly.
>
> Considering that, I think the choice is: if all data has the same data
> type in a column, use the correct data type. Otherwise use String.
>
> (1) Well, it can be an option. But my suggestion is, if all data has the
> same data type in a column, do not change its column name.
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Jialin Qiao <qiaojia...@apache.org> 于2020年2月7日周五 下午2:29写道:
>
>> Hi,
>>
>> In IOTDB-243 [1], We want to allow create measurements with the same name
>> but with different types in the same storage group.
>>
>> For example,
>> root.sg1.d1.s1, int32
>> root.sg1.d1.s2 int32
>> root.sg1.d2.s1 boolean
>> root.sg1.d2.s2 int32
>>
>> This may cause trouble in group by device query. How do we organize the
>> result (table schema)? I thought of three ways:
>>
>> (1) Time, Device, s1_int, s1_boolean, s2_int32
>>
>> * advantage:
>> - No ambiguity
>> - The number of columns is acceptable.
>>
>> * disadvantage:
>> - In most cases, the datatype indicator is redundant and weird.
>> - Difficult to use parallelization among devices in the query.
>>
>> (2) Time, d1, s1, s2 Time, d2, s1, s2
>>
>> * advantage:
>> - No ambiguity
>> - This could leverage the parallelization among devices in the query.
>>
>> * disadvantage:
>> - The number of columns may be large.
>>
>> (3) Time DeviceId, s1, s2
>>
>> This may need to do much work in the QueryDataSet, and users need to get
>> value carefully according to the measurement type of one device.
>> Otherwise,
>> it may cause RunTimeException in JDBC Client.
>>
>> * advantage:
>> - The number of columns is the minimal.
>>
>> * disadvantage:
>> - May cause ambiguity, a column of one table has more than one type, which
>> also conflicts to the Spark connector or Hive connector.
>> - Difficult to use parallelization in the query.
>>
>> _______________
>>
>> From my perspective, I prefer (1) ≈ (2) > (3).
>>
>> What's your opinion?
>>
>> [1] https://issues.apache.org/jira/browse/IOTDB-243
>>
>> Thanks,
>> —————————————————
>> Jialin Qiao
>> School of Software, Tsinghua University
>>
>> 乔嘉林
>> 清华大学 软件学院
>>
>

Reply via email to