Re: [DISCUSS] Table schema of group by device

Jialin Qiao Tue, 11 Feb 2020 01:50:18 -0800

Hi,

If we use text when a column has multiple types, I'm ok with (3).


Thanks,
—————————————————
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院


魏祥威 <526213...@qq.com> 于2020年2月9日周日 下午5:30写道：

> Hi,
>
>
> I agree with the opinion of Xiangdong Huang.
>
>
> (3) is the most friendly for users who are using Relational DB, and if
> they want a relational query (group by device query), their applications
> should guarantee the consistency of data type.
>
> Best,
> Xiangwei Wei
>
>
>
> &nbsp;
>
>
>
>
> ------------------&nbsp;原始邮件&nbsp;------------------
> 发件人:&nbsp;"Xiangdong Huang"<saint...@gmail.com&gt;;
> 发送时间:&nbsp;2020年2月7日(星期五) 下午2:58
> 收件人:&nbsp;"dev"<dev@iotdb.apache.org&gt;;
>
> 主题:&nbsp;Re: [DISCUSS] Table schema of group by device
>
>
>
> One more suggestion, using "align by device" is more clear than "group by
> device".
>
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
> &nbsp;黄向东
> 清华大学 软件学院
>
>
> Xiangdong Huang <saint...@gmail.com&gt; 于2020年2月7日周五 下午2:56写道：
>
> &gt;&nbsp; -1 for (2), forever and&nbsp; I think I will never vote +1 for
> it...
> &gt;
> &gt; If you do it like that, there is no chance to replace those
> applications
> &gt; which are using relational db to manage timeseries data.
> &gt;
> &gt; (3) is the most friendly for those developers who are using
> Relational DB,
> &gt; because when they write a SQL like "select c1, c2, c3 FROM", they
> think it
> &gt; is of course that the resultset has 3 columns...
> &gt;
> &gt; Of course, for users who are using RDB and want a table like "Time
> &gt; DeviceId, s1, s2", their applications can guarantee the data type of
> data
> &gt; in s2 as const.
> &gt; If there are many data types in s2, the RDB users may use "text"
> &gt; "varchar2" format directly.
> &gt;
> &gt; Considering that, I think the choice is: if all data has the same data
> &gt; type in a column, use the correct data type. Otherwise use String.
> &gt;
> &gt; (1) Well, it can be an option. But my suggestion is, if all data has
> the
> &gt; same data type in a column, do not change its column name.
> &gt;
> &gt; Best,
> &gt; -----------------------------------
> &gt; Xiangdong Huang
> &gt; School of Software, Tsinghua University
> &gt;
> &gt;&nbsp; 黄向东
> &gt; 清华大学 软件学院
> &gt;
> &gt;
> &gt; Jialin Qiao <qiaojia...@apache.org&gt; 于2020年2月7日周五 下午2:29写道：
> &gt;
> &gt;&gt; Hi,
> &gt;&gt;
> &gt;&gt; In IOTDB-243 [1], We want to allow create measurements with the
> same name
> &gt;&gt; but with different types in the same storage group.
> &gt;&gt;
> &gt;&gt; For example,
> &gt;&gt; root.sg1.d1.s1, int32
> &gt;&gt; root.sg1.d1.s2 int32
> &gt;&gt; root.sg1.d2.s1 boolean
> &gt;&gt; root.sg1.d2.s2 int32
> &gt;&gt;
> &gt;&gt; This may cause trouble in group by device query. How do we
> organize the
> &gt;&gt; result (table schema)? I thought of three ways:
> &gt;&gt;
> &gt;&gt; (1) Time, Device, s1_int, s1_boolean, s2_int32
> &gt;&gt;
> &gt;&gt; * advantage：
> &gt;&gt; - No ambiguity
> &gt;&gt; - The number of columns is acceptable.
> &gt;&gt;
> &gt;&gt; * disadvantage:
> &gt;&gt; - In most cases, the datatype indicator is redundant and weird.
> &gt;&gt; - Difficult to use parallelization among devices in the query.
> &gt;&gt;
> &gt;&gt; (2) Time, d1, s1, s2 Time, d2, s1, s2
> &gt;&gt;
> &gt;&gt; * advantage:
> &gt;&gt; - No ambiguity
> &gt;&gt; - This could leverage the parallelization among devices in the
> query.
> &gt;&gt;
> &gt;&gt; * disadvantage:
> &gt;&gt; - The number of columns may be large.
> &gt;&gt;
> &gt;&gt; (3) Time DeviceId, s1, s2
> &gt;&gt;
> &gt;&gt; This may need to do much work in the QueryDataSet, and users need
> to get
> &gt;&gt; value carefully according to the measurement type of one device.
> &gt;&gt; Otherwise,
> &gt;&gt; it may cause RunTimeException in JDBC Client.
> &gt;&gt;
> &gt;&gt; * advantage:
> &gt;&gt; - The number of columns is the minimal.
> &gt;&gt;
> &gt;&gt; * disadvantage:
> &gt;&gt; - May cause ambiguity, a column of one table has more than one
> type, which
> &gt;&gt; also conflicts to the Spark connector or Hive connector.
> &gt;&gt; - Difficult to use parallelization in the query.
> &gt;&gt;
> &gt;&gt; _______________
> &gt;&gt;
> &gt;&gt; From my perspective, I prefer (1) ≈ (2) &gt; (3).
> &gt;&gt;
> &gt;&gt; What's your opinion?
> &gt;&gt;
> &gt;&gt; [1] https://issues.apache.org/jira/browse/IOTDB-243
> &gt;&gt;
> &gt;&gt; Thanks,
> &gt;&gt; —————————————————
> &gt;&gt; Jialin Qiao
> &gt;&gt; School of Software, Tsinghua University
> &gt;&gt;
> &gt;&gt; 乔嘉林
> &gt;&gt; 清华大学 软件学院
> &gt;&gt;
> &gt;

Re: [DISCUSS] Table schema of group by device

Reply via email to