(paimon-website) branch master updated: release: 1.0.1 version release

lzljs3620320 Sun, 09 Feb 2025 19:28:54 -0800

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 440175e69c release: 1.0.1 version release
440175e69c is described below

commit 440175e69c550b2f02419607fd60c381aab08d19
Author: JingsongLi <jingsongl...@gmail.com>
AuthorDate: Mon Feb 10 11:28:14 2025 +0800

    release: 1.0.1 version release
---
 community/docs/downloads.md            |   6 +-
 community/docs/releases/release-1.0.md | 209 +++++++++++++++++++++++++++++++++
 public/img/v1-catalog.png              | Bin 0 -> 168220 bytes
 public/img/v1-commit.png               | Bin 0 -> 216355 bytes
 public/img/v1-iceberg.png              | Bin 0 -> 1159312 bytes
 public/img/v1-sort.png                 | Bin 0 -> 233164 bytes
 6 files changed, 212 insertions(+), 3 deletions(-)

diff --git a/community/docs/downloads.md b/community/docs/downloads.md
index e4d411c290..52405452f7 100644
--- a/community/docs/downloads.md
+++ b/community/docs/downloads.md
@@ -12,7 +12,7 @@ Paimon is released as a source artifact, and also through 
Maven.
 
 | RELEASE | DATE       | DOWNLOAD                                              
                                                                                
                                                                                
                                                                                
             |
 
|---------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| 1.0.0   | 2025-01-03 | 
[tar](https://www.apache.org/dyn/closer.lua/paimon/paimon-1.0.0/apache-paimon-1.0.0-src.tgz)
               
([digest](https://downloads.apache.org/paimon/paimon-1.0.0/apache-paimon-1.0.0-src.tgz.sha512),
                
[pgp](https://downloads.apache.org/paimon/paimon-1.0.0/apache-paimon-1.0.0-src.tgz.asc))
 |                                                                              
                                                                                
  [...]
+| 1.0.1   | 2025-01-03 | 
[tar](https://www.apache.org/dyn/closer.lua/paimon/paimon-1.0.1/apache-paimon-1.0.1-src.tgz)
               
([digest](https://downloads.apache.org/paimon/paimon-1.0.1/apache-paimon-1.0.1-src.tgz.sha512),
                
[pgp](https://downloads.apache.org/paimon/paimon-1.0.1/apache-paimon-1.0.1-src.tgz.asc))
 |                                                                              
                                                                                
  [...]
 | 0.9.0   | 2024-09-13 | 
[tar](https://www.apache.org/dyn/closer.lua/paimon/paimon-0.9.0/apache-paimon-0.9.0-src.tgz)
               
([digest](https://downloads.apache.org/paimon/paimon-0.9.0/apache-paimon-0.9.0-src.tgz.sha512),
                
[pgp](https://downloads.apache.org/paimon/paimon-0.9.0/apache-paimon-0.9.0-src.tgz.asc))
 |                                                                              
                                                                                
  [...]
 | 0.8.2   | 2024-07-12 | 
[tar](https://www.apache.org/dyn/closer.lua/paimon/paimon-0.8.2/apache-paimon-0.8.2-src.tgz)
               
([digest](https://downloads.apache.org/paimon/paimon-0.8.2/apache-paimon-0.8.2-src.tgz.sha512),
                
[pgp](https://downloads.apache.org/paimon/paimon-0.8.2/apache-paimon-0.8.2-src.tgz.asc))
 |                                                                              
                                                                                
  [...]
 
@@ -78,7 +78,7 @@ Please replace `${flink.version}` in the following xml file 
to the version of Fl
     <dependency>
         <groupId>org.apache.paimon</groupId>
         <artifactId>paimon-flink-${flink.version}</artifactId>
-        <version>1.0.0</version>
+        <version>1.0.1</version>
     </dependency>
 </dependencies>
 ```
@@ -94,7 +94,7 @@ Please replace `${spark.version}` in the following xml file 
to the version of Fl
     <dependency>
         <groupId>org.apache.paimon</groupId>
         <artifactId>paimon-spark-${spark.version}</artifactId>
-        <version>1.0.0</version>
+        <version>1.0.1</version>
     </dependency>
 </dependencies>
 ```
diff --git a/community/docs/releases/release-1.0.md 
b/community/docs/releases/release-1.0.md
new file mode 100644
index 0000000000..f3fe21d04c
--- /dev/null
+++ b/community/docs/releases/release-1.0.md
@@ -0,0 +1,209 @@
+---
+title: "Release 1.0"
+type: release
+version: 1.0.1
+---
+
+# Apache Paimon 1.0 Available
+
+Feb 10, 2025 - Jingsong Lee (jingsongl...@gmail.com)
+
+Apache Paimon PMC Officially Releases Milestone 1.0 Stable Version (Version 
1.0.1).
+
+This core version has undergone nearly 5 months of meticulous refinement, 
gathering the wisdom of over 70 developers from the
+global open-source community, resulting in more than 520 code submissions that 
fully demonstrate the power of community-driven
+technological evolution. We sincerely thank all the developers who contributed!
+
+Notably, Paimon 1.0 has successfully passed rigorous production verification 
from top enterprises such as Alibaba Group and
+ByteDance. During the peak of traffic during the Double Eleven Global Shopping 
Festival, Paimon demonstrated excellent scalability
+and stability. This milestone not only marks the official entry of stream 
lake-warehouse technology into a new stage of mature 
+development but also proves that Paimon’s lake storage architecture has 
industrial-grade capabilities to support data storage 
+exceeding 100 PB.
+
+## Overview
+
+In Paimon 1.0, there were no major kernel modifications introduced; the focus 
has been on stabilizing and optimizing kernel design.
+We have introduced some eco-friendly features related to the Catalog, 
strengthened snapshot submissions, enhanced lookup performance,
+and optimized the storage of primary key tables.
+
+1. We have introduced many exciting ecosystem integrations; the Catalog 
ecosystem now supports Format Tables to be compatible with Hive tables.
+   It also supports views and introduces Object Tables to manage unstructured 
data.
+2. Iceberg compatibility has officially entered production availability in 
version 1.0, with integration into AWS-related ecosystems, and DuckDB's 
capability to read Iceberg tables.
+3. We have significantly optimized the capabilities related to snapshot 
transactions to handle submissions of extremely large or wide tables.
+4. We have core enhancements to the Orphan File Clean to enable distributed 
execution, which is a core part of our daily table management and operations.
+5. The algorithm for local lookups (which is the basis for Lookup 
Changelog-producer, primary key table Deletion Vectors mode, and Flink Lookup 
Join) has changed from Hash to Sort, greatly enhancing local disk compression 
rates.
+6. Count(*) SQL acceleration has been implemented, which no longer requires 
reading data files but returns results directly; this acceleration effect will 
benefit both non-primary key tables and primary key DV tables.
+7. The Thin mode for primary key tables has been introduced, where primary 
key-related fields will no longer be stored redundantly; however, this feature 
is not enabled by default for compatibility reasons.
+8. The Bitmap in file indexing has been significantly enhanced by allowing the 
index to be pushed down to the page level of Parquet files, which significantly 
boosts query performance.
+
+## Ecosystem
+
+### Catalog Ecosystem
+
+<img src="./img/v1-catalog.png" alt="catalog" />
+
+The image above shows the current Catalog ecosystem related to Paimon 1.0. 
Before version 1.0, there were only Paimon Primary Key Tables and Append 
Tables; version 1.0 has greatly expanded this ecosystem:
+
+1. View: When the metastore (such as Hive) supports views, it can support 
views. If there is no metastore, only temporary views can be used, which exist 
only in the current session. Persistent views will currently save the original 
SQL. If cross-engine view usage is needed, users can write cross-engine SQL 
statements.
+2. Format Table: When the metastore (such as Hive) supports format tables, 
Format Tables are supported. Hive tables in the metastore will map to Paimon's 
format tables for reading and writing by computation engines (Spark, Hive, 
Flink). This type is mainly used for compatibility with Hive tables.
+3. Object Table: Object Tables provide metadata indexing for unstructured data 
objects in designated object storage directories, allowing users to analyze 
unstructured data in object storage.
+4. Materialized Table: Materialized Tables aim to simplify batch processing 
and streaming data pipelines, providing a consistent development experience, 
such as with Flink Materialized Tables. Currently, only Flink SQL is integrated 
into materialized tables.
+
+The upcoming Paimon versions will:
+
+1. Bring support for Views and Format Tables to FileSystemCatalog.
+2. Consider supporting dialects for Views.
+3. Propose cross-engine materialized view definitions for Materialized Tables, 
allowing cooperative operations with engines such as Spark and StarRocks.
+4. Introduce cross-engine function definitions.
+
+### Iceberg Compatibility
+
+Paimon supports generating Iceberg-compatible metadata so that Iceberg Readers 
can directly use Paimon tables. This Iceberg compatibility has become 
production-ready in this version!
+
+<img src="./img/v1-iceberg.png" alt="iceberg" />
+
+With Iceberg compatibility, a large ecosystem of Iceberg can be unlocked, 
including various serverless computing products. Paimon 1.0 core addressed:
+
+1. Support for complex types.
+2. Verification of integration with Athena and Glue Catalog.
+3. Verification of DuckDB’s capability to read Iceberg data.
+
+Paimon key tables organize data files as an LSM tree, meaning data files must 
be merged before queries or combined with Deletion Vectors. However, Iceberg 
Readers cannot merge data files and can only query the highest level of data 
files in the LSM tree. The highest-level data files are generated through full 
compaction.
+
+Fortunately, Iceberg’s latest R&D also supports Deletion Vectors, allowing 
Paimon primary key table Deletion Vectors to produce Iceberg snapshots without 
loss. Further integration will occur following Iceberg's new version release.
+
+## Transactions
+
+In Alibaba Group's Taobao business, the largest single partition of the 
biggest table even exceeds 100TB, presenting many challenges for the Manifest. 
Therefore, in Paimon 1.0, we have significantly improved the performance of 
snapshot submissions and the performance during multi-job writing.
+
+This version introduced Manifest Merging memory optimization, avoiding 
retaining full DataFileMeta in memory, preventing memory issues that had led to 
instability (e.g., OOM) in previous versions.
+
+<img src="./img/v1-commit.png" alt="commit" />
+
+Simultaneous submissions from multiple jobs are a core capability of Paimon; 
for instance, one job writes data while another handles compaction. However, in 
previous versions, if the data volume was too large and too many files were 
involved, submission failures occurred frequently, which could lead to job 
failover. This version significantly optimizes the recurring conflict issues 
during simultaneous writes, preventing prolonged conflicts that could lead to 
job failover.
+
+This version introduced a Dense mode for statistics that avoids occupying a 
lot of unnecessary space, allowing super wide tables with over 1000 columns to 
be well supported. The amount of metadata storage has been reduced by 100 times 
when 'metadata.stats-mode' = 'none'.
+
+The Dense mode for statistics may introduce compatibility issues; however, it 
only affects the 'metadata.stats-mode' = 'none' case. If you want to maintain 
compatibility with old-version Readers, you can configure 
'metadata.stats-dense-store' = 'false', though this is generally not 
recommended as it does not save much metadata space in non-dense mode.
+
+When deleting files, it also supports not saving statistical information, 
configurable via 'manifest.delete-file-drop-stats' = 'true', with a default 
value of false due to similar compatibility concerns with older-version 
Readers. If you have upgraded all Readers, it is advisable to enable this 
parameter, which will further enhance compaction's stability.
+
+## Management
+
+When Paimon’s writing jobs undergo failover, some uncommitted temporary files 
may remain, wasting storage space in the file system; thus, Paimon provides 
tools for cleaning orphan files.
+
+However, this tool was executed in single-machine mode in previous versions, 
leading to significant performance issues in cleaning super large tables. 
Therefore, this version has introduced distributed execution capabilities that 
support Flink and Spark computation engines.
+
+## Performance
+
+### Local Lookup
+
+Local lookup is the fundamental capability of Paimon’s point query LSM 
structure, serving as the foundational infrastructure for:
+
+1. Lookup changelog-producer: generates changelogs from historical files.
+2. Primary key table deletion vectors mode: generates deletion vectors from 
historical files.
+3. Flink Lookup Join: uses local lookup when the join condition is the primary 
key of the dimension table.
+
+Earlier versions used HashFile to solve the lookup challenge, though it had 
two drawbacks:
+
+1. Multiple disk copies occur while generating HashFile.
+2. HashFile had poor compression rates, even with ZSTD compression only 
achieving 2-3 times compression.
+
+In Paimon 1.0, we switched to a new Sort Based file format:
+
+<img src="./img/v1-sort.png" alt="sort" />
+
+This format, similar to LevelDB SST files, has better compression rates and 
lower disk IO amplification. Tests on various datasets have shown compression 
rates improved by 3-5 times in many scenarios compared to Hash files.
+
+### Count(*) Optimization
+
+Paimon stores statistical information about data files in the Manifest. For 
Count(*) SQL queries, the data could originally be returned from within the 
Manifest, but prior versions still read every file, leading to high consumption.
+
+In Paimon 1.0, we optimized this SQL in conjunction with Flink and Spark, for 
example, in the following SQL:
+
+```sql
+-- Querying Append table, can be accelerated
+SELECT COUNT(*) FROM APPEND_T;
+
+-- Querying Append table with specified partition, can be accelerated
+SELECT COUNT(*) FROM APPEND_T WHERE dt = '20250101';
+
+-- Querying primary key table in DV mode, can be accelerated
+SELECT COUNT(*) FROM PRIMARY_KEY_T_DV;
+```
+
+### Thin Mode for Primary Key Tables
+
+Previously, Paimon saved the following structure to the primary key table data 
files:
+
+1. Key columns
+2. _VALUE_KIND
+3. _SEQUENCE_NUMBER
+4. Value columns
+
+In practice, it can be observed that the Value columns already contain all Key 
columns, thus saving the same data redundantly.
+
+In the new version, the 'data-file.thin-mode' option has been provided to 
avoid saving redundant Key columns, which can save storage and enhance 
read/write performance.
+
+Note that this feature may cause compatibility issues with old-version 
Readers; therefore, it is not enabled by default. Please ensure to upgrade your 
reading end first.
+
+### File Index
+
+The bitmap index in previous versions had some effect on point queries but was 
limited. In this version:
+
+1. The Bitmap has been pushed down to the Parquet Reader and operates at the 
page level, significantly enhancing the filtering effect of the Bitmap and 
improving performance.
+2. Bit-Slice Index Bitmap has been supported; BSI file indexing is a numeric 
range index used to accelerate range queries.
+3. Additionally, primary key tables in Deletion Vectors mode support 
file-level indexing.
+
+The new Bitmap index is also under discussion in the community and will 
significantly enhance performance when indexing columns have a high cardinality.
+
+### Catalog Optimization
+
+To avoid bottlenecks caused by metadata, the Catalog has also undergone some 
optimizations:
+
+1. HiveCatalog: Significantly optimized unnecessary getTable calls, greatly 
accelerating the performance of listTables.
+2. CachingCatalog: Introduced partition caching, added snapshot file caching, 
and included cached statistics.
+
+### Parquet Reader
+
+We completely restructured Paimon's Parquet Reader by referencing Spark SQL's 
Parquet Reader, significantly improving the read performance of complex types 
and reducing bugs related to complex types.
+
+## Other Features
+
+Improvements in Nested Schema support Nested Projection Pushdown and Nested 
Schema Evolution, allowing Paimon to fully support nested types and complete 
Schema Evolution.
+
+Interestingly, a binlog system table has been introduced to allow querying 
binlogs through a binlog table; previous and subsequent updates will be packed 
into a single line, allowing stream reads to behave like MySQL binlogs, 
enabling you to处理在一起的 -U 和 +U 消息 (process -U and +U messages together).
+
+## Flink Integration
+
+In addition to advancing the Materialized Table with the Flink community, 
Paimon 1.0 also presents many enhancements related to stream processing and 
batch processing.
+
+For Stream Processing:
+
+1. CDC: Kafka Sync now supports complex types and Aliyun JSON types.
+2. Local Merge (configured with 'local-merge-buffer-size') now utilizes a Hash 
algorithm, which is very beneficial for the efficiency of certain aggregated 
tables.
+3. The Lookup Join has added a blacklist mechanism to avoid refresh impacts on 
stability during critical moments.
+4. Memory optimizations for automatic compaction of Append tables allow 
extremely large tables to automatically compact small files.
+5. When enabling Changelog-producer as Lookup, if Flink's checkpoint interval 
is short (e.g., 30 seconds), and the number of buckets is large, each snapshot 
may generate many small change log files. Therefore, a configuration 
'changelog.precommit-compact' was introduced to compress small change log files 
into larger ones.
+
+For Batch Processing:
+
+1. Supports speculative execution in Flink batch processing.
+2. Supports integration of Flink with Paimon, enabling access to statistics.
+3. Supports generating Paimon statistics using Flink SQL.
+
+## Spark Integration
+
+Spark 4 has released preview 2 version, one of its important capabilities 
being Variant, including Variant Shredding, which can greatly enhance the 
handling of semi-structured data. The Paimon community will connect to the 
Variant and Variant Shredding capabilities in subsequent releases.
+
+Additionally, Spark SQL supports using dynamic parameters to configure tables, 
for example:
+
+```sql
+-- set scan.snapshot-id=1 for the table default.T in any catalogs
+SET spark.paimon.*.default.T.scan.snapshot-id=1;
+SELECT * FROM default.T;
+```
+
+## Future Plans: REST
+
+In future plans, we will focus on building the RESTCatalog and promoting the 
development of REST Servers. We expect to significantly enhance the overall 
experience, availability, and usability of the lake-warehouse through REST 
metadata services.
diff --git a/public/img/v1-catalog.png b/public/img/v1-catalog.png
new file mode 100644
index 0000000000..b8739d3678
Binary files /dev/null and b/public/img/v1-catalog.png differ
diff --git a/public/img/v1-commit.png b/public/img/v1-commit.png
new file mode 100644
index 0000000000..afa9118417
Binary files /dev/null and b/public/img/v1-commit.png differ
diff --git a/public/img/v1-iceberg.png b/public/img/v1-iceberg.png
new file mode 100644
index 0000000000..8e4692bea6
Binary files /dev/null and b/public/img/v1-iceberg.png differ
diff --git a/public/img/v1-sort.png b/public/img/v1-sort.png
new file mode 100644
index 0000000000..4c6597db8a
Binary files /dev/null and b/public/img/v1-sort.png differ

(paimon-website) branch master updated: release: 1.0.1 version release

Reply via email to