This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git


The following commit(s) were added to refs/heads/master by this push:
     new 6c2a95f94 [doc] Document File Format for write performance
6c2a95f94 is described below

commit 6c2a95f948018c0c39467d8fc13eabc11fa40e59
Author: JingsongLi <[email protected]>
AuthorDate: Wed Jul 5 21:05:05 2023 +0800

    [doc] Document File Format for write performance
---
 docs/content/maintenance/write-performance.md | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/docs/content/maintenance/write-performance.md 
b/docs/content/maintenance/write-performance.md
index 140173c6f..502b05f1d 100644
--- a/docs/content/maintenance/write-performance.md
+++ b/docs/content/maintenance/write-performance.md
@@ -135,6 +135,26 @@ One can easily see that too many sorted runs will result 
in poor query performan
 
 Compaction will become less frequent when `num-sorted-run.compaction-trigger` 
becomes larger, thus improving writing performance. However, if this value 
becomes too large, more memory and CPU time will be needed when querying the 
table. This is a trade-off between writing and query performance.
 
+## File Format
+
+If you want to achieve ultimate compaction performance, you can consider using 
row storage file format AVRO.
+- The advantage is that you can achieve high write throughput and compaction 
performance.
+- The disadvantage is that your analysis queries will be slow, and the biggest 
problem with row storage is that it
+  does not have the query projection. For example, if the table have 100 
columns but only query a few columns, the
+  IO of row storage cannot be ignored. Additionally, compression efficiency 
will decrease and storage costs will
+  increase.
+
+This a tradeoff.
+
+Enable row storage through the following options:
+```shell
+file.format = avro
+metadata.stats-mode = none
+```
+
+The collection of statistical information for row storage is a bit expensive, 
so I suggest turning off statistical
+information as well.
+
 ## Write Initialize
 
 In the initialization of write, the writer of the bucket needs to read all 
historical files. If there is a bottleneck

Reply via email to