Re: IoTDB write performance

Jialin Qiao Sat, 10 Apr 2021 23:26:08 -0700

Hi,

1. *Can I use the API on v0.11.2 (I am currently using this) or do I
need v0.11.3?*


You can use 0.11.2, we do not change the RPC API in a minor(bug-fix)
version.

2. *Is there any limit to the number of rows that can
be inserted at once using insertTablet()? Or, is there an optimal number of
rows per insertTablet() to get best performance?*

It depends on the columns of the Tablet. The default rpc size limit is
64MB, so rows * columns * 8Byte < 64MB.
I usually set rows to 1000 when I have 1000 columns.

3. *Since insertTablet() also takes rows for one
device, what exactly is the difference between insertTablet() and
insertRecordsOfOneDevice()? Are they for different use-cases? Which
performs better?*

insertRecordsOfOneDevice is actually an improved version of
insertRecords(): all records belong to one device, so we only acquire one
writelock in the writing process.

InsertTablet requires that each row has all measurements(using primitive
data types to store data), insertRecordsOfOneDevice allows each row has
different measurements(using Object to store data).

Performance: InsertTablet > insertRecordsOfOneDevice > insertRecords >
insertRecord

InsertTablet is always the fastest :)

4.  *I suppose IoTDB does not have benchmark numbers for such devices? *

We haven't tested these cases.

5. *How soon can we expect it to come out? Or can I test it out even now
since I eventually
plan to start working with it? Is it stable enough?*

The release is already got more than 3 binding votes, it will come out in
one or two days.
You can get it from https://dist.apache.org/repos/dist/dev/iotdb/0.12.0/rc1

We fixed nearly all important known bugs in 0.12.0. The single-node version
is stable enough.
The data migration in the cluster version is not supported, and the test of
the cluster version is not very much.
Welcome to test it and give feedback :)

6. *What benefits does the new Tsfile structure in v0.12 bring? Does it
improve DB data ingestion/query performance?*

It removes some redundant fields in the previous version (decreasing disk
occupation) and optimizes the performance of the raw data query.

Thanks,
—————————————————
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院


Dhruv Garg <[email protected]> 于2021年4月11日周日 下午1:38写道：

> Hello Jialin,
>
> Thanks for your response.
>
> 1. Alright I will definitely move from insertRecords() to insertTablet()
> then. *Can I use the API on v0.11.2 (I am currently using this) or do I
> need v0.11.3?*
>
> 2. The info you provided on insertTablet() is very helpful. I could start
> with that. I think specifying the data types at the top of the file will
> also help reduce the data-type inference time for the api and possibly
> reduce ingestion time. *Is there any limit to the number of rows that can
> be inserted at once using insertTablet()? Or, is there an optimal number of
> rows per insertTablet() to get best performance?*
>
> 3. Also, I see that there is a new API in v0.12 called
> insertRecordsOfOneDevice(). *Since insertTablet() also takes rows for one
> device, what exactly is the difference between insertTablet() and
> insertRecordsOfOneDevice()? Are they for different use-cases? Which
> performs better?*
>
> 4. So the Raspberry Pi are low-end devices. They run on lower RAM (2GB),
> have ARM processors and use SD cards for persistent storage. *I suppose
> IoTDB does not have benchmark numbers for such devices? *
>
> A couple of additional questions:
> 5. I see that the IoTDB team is nearing release of v0.12. *How soon can we
> expect it to come out? Or can I test it out even now since I eventually
> plan to start working with it? Is it stable enough?*
>
> 6. *What benefits does the new Tsfile structure in v0.12 bring? Does it
> improve DB data ingestion/query performance?*
>
> Thanks in advance!
>
> On Thu, 8 Apr 2021 at 17:26, Jialin Qiao <[email protected]> wrote:
>
> > Hi,
> >
> > 1. InsertTablets could reach more than 3 times faster than insertRecords.
> >
> > 2. Yes, Tablet is actually a small table with some columns and rows.
> > It has a time column and many value columns. In each row, all columns
> must
> > have a value.
> >
> > An example:
> >
> > time, root.sg.d1.s1, root.sg.d1.s2
> > 1, 1, 2.2
> > 2, 1, 2.2
> > 3, 1, 2.2
> >
> > Tablets do not allow to have null values.
> >
> > However, the tablet uses an array of primitive types to store data in a
> > columnar format, e.g., long[], int[], double[].
> > If you store data in a text file, the data type indicator is needed, like
> > this:
> >
> > time, root.sg.d1.s1, root.sg.d1.s2
> > long, int, double
> > 1, 1, 2.2
> > 2, 1, 2.2
> > 3, 1, 2.2
> >
> > Then you could generate a tablet from the data and then use the
> > insertTablet.
> >
> > 3. You could check the memory allocated in Raspberry pi. Is it the same
> as
> > on desktop? This may impact the write throughput.
> >
> > Thanks,
> > —————————————————
> > Jialin Qiao
> > School of Software, Tsinghua University
> >
> > 乔嘉林
> > 清华大学 软件学院
> >
> >
> > Dhruv Garg <[email protected]> 于2021年4月8日周四 下午1:36写道：
> >
> > > Hello all,
> > >
> > > In the past month I have been using the JDBC client of IoTDB to write
> > data
> > > from CSV into IoTDB and also query on the data. Looking at the CSV code
> > in
> > > ImportCsv.java
> > > <
> > >
> >
> https://github.com/apache/iotdb/blob/master/cli/src/main/java/org/apache/iotdb/tool/ImportCsv.java
> > > >,
> > > it seems that csv itself is again parsed into an IoTDB-friendly
> structure
> > > and then ingested. I would like to avoid the csv-parsing time and
> > directly
> > > provide the data as needed to IoTDB. This should improve the ingestion
> > > performance.
> > >
> > > What I am talking about is similar to InfluxDB where we can also parse
> > from
> > > CSV to InfluxDB's native line protocol and then ingest the data into
> DB.
> > > However, for better performance, InfluxDB also provides write APIs
> > > <
> > >
> >
> https://github.com/influxdata/influxdb-client-java/blob/master/client/src/main/java/com/influxdb/client/WriteApi.java
> > > >
> > > to take records in line protocol as input and directly ingest those.
> > >
> > > I have three questions:
> > >
> > >    1. I see that IoTDB is getting newer write APIs like InsertTablets
> and
> > >    it seems that it is designed to be faster than insertRecords.
> > > Approximately
> > >    how much performance improvement have you seen with InsertTablets?
> > >    2. Is there a way to create a data file such that it is easy to use
> > >    InsertTablets with it? This is to know if I can create an
> > IoTDB-friendly
> > >    and IoTDB-specific input file and then directly use InsertTablets to
> > > ingest
> > >    the data.
> > >    3. As a preliminary check, I am also trying out IoTDB on Raspberry
> Pi
> > 4B
> > >    devices. However, the ingestion time with CSV on the Raspberry Pi is
> > > taking
> > >    10 times of what it is on the desktop (amd64). This ratio should
> have
> > >    ideally been closer to 4X, based on other applications that I have
> > >    benchmarked. Have you all run any Raspberry Pi benchmarks for IoTDB
> > > earlier?
> > >
> > > I would be awaiting your response. Thanks!
> > >
> > > Regards,
> > > dgargcs
> > >
> >
>

Re: IoTDB write performance

Reply via email to