good idea,i support redesign the csv


---Original---
From: "Xiangdong Huang"<saint...@gmail.com&gt;
Date: Wed, Jun 9, 2021 14:19 PM
To: "dev"<dev@iotdb.apache.org&gt;;
Subject: Re: support csv format


Hi,

Yes.&nbsp; Do you or your mates have time to do the implementation?

I think,
1. we need to modify the document of CSV Tools, and add sample data
into your examples folder.
2. rewrite the tool and (only) support RFC 4180
3. redesign the csv schema if needed.

Best,

-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

&nbsp;黄向东
清华大学 软件学院

Chao Wang <ccgow...@163.com&gt; 于2021年6月9日周三 上午9:17写道:
&gt;
&gt; Good idea!
&gt;
&gt;
&gt; It's friendly to user for supporting `align by device` schema.
&gt;
&gt;
&gt; And I think the csv format is also important, we could pre-define a 
standard csv format to make user know not all csv format we could support.
&gt;
&gt;
&gt; If we don't pass this kind of information to them, the user will think 
that all types are supported, and it's not clear how to modify when the import 
error occurs, so as to ensure that the data can be imported normally.
&gt;
&gt;
&gt; |
&gt; Chao Wang
&gt; ccgow...@163.com
&gt; |
&gt; 签名由网易邮箱大师定制
&gt; On 6/9/2021 08:58,Xiangdong Huang<saint...@gmail.com&gt; wrote:
&gt; Hi,
&gt;
&gt; At present, I see that the import CSV tool of iotdb does not support the 
file format of CSV very clearly.&nbsp; And users will be very confused about 
using the tool.
&gt; IMO, a standard CSV format is needed. Apache common CSV is fine, RFC
&gt; 4180 is fine.
&gt; BUT, what is the real confusing thing?&nbsp; whether the CSV obeys RFC
&gt; 4180? or what does the schema of the CSV is?
&gt;
&gt; I think the answer is the latter.
&gt;
&gt; For example, as far as I know, current CSV schema requires like this:
&gt; ```
&gt; Time, root.d1.s1, root.d1.s2, root.d2.s1, root.d2.s5 ...
&gt; ```
&gt; (also called, `align by time` table format)
&gt;
&gt; Is this really good for users? And is this really good for developers
&gt; (IoTDB developers, I mean)?
&gt;
&gt; - We can say YES, because it supports different devices having
&gt; different measurements/sensors.
&gt;
&gt; - We can say NO, because in a Relational database user's view, the
&gt; following table schema is more comfortable:
&gt;
&gt; ```
&gt; Time, deviceId, s1, s2, s3 ...
&gt; ```
&gt; (also called, `align by device` table format)
&gt;
&gt; Meanwhile, the above schema will make CSV processing much easier for us.
&gt; E.g., we can use insertRecordsOfOneDevice, we can use Tablets.
&gt; If we require the data in the CSV must be ordered in <deviceID, time&gt;,
&gt; then we can do more.
&gt;
&gt; (More about CSV Tool, which has no relation with the above two schemas:
&gt; If we improve our TsFile Load tool, we can even convert the CSV to a
&gt; TsFile directly and then load it to IoTDB.)
&gt;
&gt;
&gt; So, I think the more important thing is, which schema we need to support 
in CSV?
&gt; Maybe this question can be extended to another big question: which
&gt; kind of relational table structure we want to support (or want IoTDB
&gt; can convert to)?
&gt;
&gt; Considering many users want to integrate IoTDB with existing RDB 
tools/JDBC.
&gt; I think maybe we need to make a decision. That is, if we want to
&gt; convert IoTDB to a relational table, just use `align by device` table
&gt; format.
&gt;
&gt; 1. For csv, we just export data from IoTDB to `align by device` table
&gt; format. And by default, we accept CSV that obeys this format. (Support
&gt; format 1 can also be considered).
&gt;
&gt; 2. For JDBC (which I want to consider together), Add a parameter in to
&gt; the JDBC URL: jdbc:iotdb://<IP&gt;:<port&gt;?alignByDevice=true
&gt; If `alignByDevice=true`, then we support all JDBC interfaces, e.g.,
&gt; getTableName, etc.. and try to support DBeaver and so on.
&gt; If there is no such a parameter, we still throw `not support` for many
&gt; JDBC interfaces.
&gt;
&gt; At end, THIS IS JUST AN IDEA. Requires more discussion.
&gt;
&gt; Best,
&gt; -----------------------------------
&gt; Xiangdong Huang
&gt; School of Software, Tsinghua University
&gt;
&gt; 黄向东
&gt; 清华大学 软件学院
&gt;
&gt; Chao Wang <ccgow...@163.com&gt; 于2021年6月8日周二 下午4:12写道:
&gt;
&gt; Hi,
&gt; At present, I see that the import CSV tool of iotdb does not support the 
file format of CSV very clearly.&nbsp; And users will be very confused about 
using the tool.
&gt; I notice that Apache common CSV is a well-known project, which supports a 
variety of popular CSV formats in the industry. For example, Excel, mysql, 
rfc4180, Oracle, PostgreSQL_ CSV。
&gt;
&gt;
&gt; Can we pre define what type of CSV format we want to support, such as 
rfc4180?
&gt;
&gt;
&gt; |
&gt; Chao Wang
&gt; ccgow...@163.com
&gt; |
&gt; 签名由网易邮箱大师定制

Reply via email to