(doris-website) branch master updated: [opt] add more lakehouse related doc (#3386)

morningman Tue, 10 Feb 2026 22:15:00 -0800

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new bc6fe7cc1f4 [opt] add more lakehouse related doc (#3386)
bc6fe7cc1f4 is described below

commit bc6fe7cc1f4dafb347d2d85a334c410f2cad2116
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Wed Feb 11 14:14:48 2026 +0800

    [opt] add more lakehouse related doc (#3386)
    
    ## Versions
    
    - [x] dev
    - [x] 4.x
    - [x] 3.x
    - [ ] 2.1
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 docs/lakehouse/best-practices/optimization.md      |  74 +++++
 docs/lakehouse/catalogs/iceberg-catalog.mdx        |  64 +++++
 docs/lakehouse/file-analysis.md                    | 294 +++++++++++---------
 .../lakehouse/best-practices/optimization.md       |  74 +++++
 .../current/lakehouse/catalogs/iceberg-catalog.mdx |  64 +++++
 .../current/lakehouse/file-analysis.md             | 294 +++++++++++---------
 .../lakehouse/best-practices/optimization.md       |  74 +++++
 .../lakehouse/catalogs/iceberg-catalog.mdx         |  64 +++++
 .../version-3.x/lakehouse/file-analysis.md         | 297 ++++++++++----------
 .../lakehouse/best-practices/optimization.md       |  74 +++++
 .../lakehouse/catalogs/iceberg-catalog.mdx         |  64 +++++
 .../version-4.x/lakehouse/file-analysis.md         | 294 +++++++++++---------
 .../lakehouse/best-practices/optimization.md       |  74 +++++
 .../lakehouse/catalogs/iceberg-catalog.mdx         |  64 +++++
 .../version-3.x/lakehouse/file-analysis.md         | 301 +++++++++++----------
 .../lakehouse/best-practices/optimization.md       |  74 +++++
 .../lakehouse/catalogs/iceberg-catalog.mdx         |  64 +++++
 .../version-4.x/lakehouse/file-analysis.md         | 296 +++++++++++---------
 18 files changed, 1795 insertions(+), 809 deletions(-)

diff --git a/docs/lakehouse/best-practices/optimization.md 
b/docs/lakehouse/best-practices/optimization.md
index 635415e990d..9aced42cec1 100644
--- a/docs/lakehouse/best-practices/optimization.md
+++ b/docs/lakehouse/best-practices/optimization.md
@@ -36,6 +36,32 @@ Since version 4.0.2, cache warmup functionality is 
supported, which can further
 
 Please refer to the **HDFS IO Optimization** section in the [HDFS 
Documentation](../storages/hdfs.md).
 
+## Split Count Limit
+
+When querying external tables (Hive, Iceberg, Paimon, etc.), Doris splits 
files into multiple splits for parallel processing. In some scenarios, 
especially when there are a large number of small files, too many splits may be 
generated, leading to:
+
+1. Memory pressure: Too many splits consume a significant amount of FE memory
+2. OOM issues: Excessive split counts may cause OutOfMemoryError
+3. Performance degradation: Managing too many splits increases query planning 
overhead
+
+You can use the `max_file_split_num` session variable to limit the maximum 
number of splits allowed per table scan (supported since 4.0.4):
+
+- Type: `int`
+- Default: `100000`
+- Description: In non-batch mode, the maximum number of splits allowed per 
table scan to prevent OOM caused by too many splits.
+
+Usage example:
+
+```sql
+-- Set maximum split count to 50000
+SET max_file_split_num = 50000;
+
+-- Disable this limit (set to 0 or negative number)
+SET max_file_split_num = 0;
+```
+
+When this limit is set, Doris dynamically calculates the minimum split size to 
ensure the split count does not exceed the specified limit.
+
 ## Merge IO Optimization
 
 For remote storage systems like HDFS and object storage, Doris optimizes IO 
access through Merge IO technology. Merge IO technology essentially merges 
multiple adjacent small IO requests into one large IO request, which can reduce 
IOPS and increase IO throughput.
@@ -71,3 +97,51 @@ If you find that `MergedBytes` is much larger than 
`RequestBytes`, it indicates
 - `merge_io_read_slice_size_bytes`
 
     Session variable, supported since version 3.1.3. Default is 8MB. If you 
find serious read amplification, you can reduce this parameter, such as to 
64KB, and observe whether the modified IO requests and query latency improve.
+
+## Parquet Page Cache
+
+:::info
+Supported since version 4.1.0.
+:::
+
+Parquet Page Cache is a page-level caching mechanism for Parquet files. This 
feature integrates with Doris's existing Page Cache framework, significantly 
improving query performance by caching decompressed (or compressed) data pages 
in memory.
+
+### Key Features
+
+1. **Unified Page Cache Integration**
+    - Shares the same underlying `StoragePageCache` framework used by Doris 
internal tables
+    - Shares memory pool and eviction policies
+    - Reuses existing cache statistics and RuntimeProfile for unified 
performance monitoring
+
+2. **Intelligent Caching Strategy**
+    - **Compression Ratio Awareness**: Automatically decides whether to cache 
compressed or decompressed data based on the 
`parquet_page_cache_decompress_threshold` parameter
+    - **Flexible Storage Approach**: Caches decompressed data when 
`decompressed size / compressed size ≤ threshold`; otherwise, decides whether 
to cache compressed data based on `enable_parquet_cache_compressed_pages`
+    - **Cache Key Design**: Uses `file_path::mtime::offset` as the cache key 
to ensure cache consistency after file modifications
+
+### Configuration Parameters
+
+The following are BE configuration parameters:
+
+- `enable_parquet_page_cache`
+
+    Whether to enable the Parquet Page Cache feature. Default is `false`.
+
+- `parquet_page_cache_decompress_threshold`
+
+    Threshold that controls whether to cache compressed or decompressed data. 
Default is `1.5`. When the ratio of `decompressed size / compressed size` is 
less than or equal to this threshold, decompressed data will be cached; 
otherwise, it will decide whether to cache compressed data based on the 
`enable_parquet_cache_compressed_pages` setting.
+
+- `enable_parquet_cache_compressed_pages`
+
+    Whether to cache compressed data pages when the compression ratio exceeds 
the threshold. Default is `true`.
+
+### Performance Monitoring
+
+You can view Parquet Page Cache usage through Query Profile:
+
+```
+ParquetPageCache:
+    - PageCacheHitCount: 1024
+    - PageCacheMissCount: 128
+```
+
+Where `PageCacheHitCount` indicates the number of cache hits, and 
`PageCacheMissCount` indicates the number of cache misses.
\ No newline at end of file
diff --git a/docs/lakehouse/catalogs/iceberg-catalog.mdx 
b/docs/lakehouse/catalogs/iceberg-catalog.mdx
index 1034b5eaa35..d874038015b 100644
--- a/docs/lakehouse/catalogs/iceberg-catalog.mdx
+++ b/docs/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2133,6 +2133,70 @@ EXECUTE cherrypick_snapshot ("snapshot_id" = 
"123456789");
 2. The operation will fail if the specified snapshot does not exist
 3. The merge operation creates a new snapshot and does not delete the original 
snapshot
 
+### expire_snapshots
+
+The `expire_snapshots` operation removes old snapshots from Iceberg tables to 
free up storage space and improve metadata performance. This operation follows 
the Apache Iceberg Spark procedure specification.
+
+> Supported version: 4.1.0+
+
+**Syntax:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE expire_snapshots ("key1" = "value1", "key2" = "value2", ...)
+```
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `older_than` | String | No | Timestamp threshold for snapshot expiration. 
Snapshots older than this will be removed. Supports ISO datetime format (e.g., 
`2024-01-01T00:00:00`) or milliseconds timestamp |
+| `retain_last` | Integer | No | Number of ancestor snapshots to preserve. 
When specified alone, automatically sets `older_than` to current time |
+| `snapshot_ids` | String | No | Comma-separated list of specific snapshot IDs 
to expire |
+| `max_concurrent_deletes` | Integer | No | Size of thread pool for delete 
operations |
+| `clean_expired_metadata` | Boolean | No | When set to `true`, cleans up 
unused partition specs and schemas |
+
+**Return Value:**
+
+Executing the `expire_snapshots` operation returns a result set with the 
following 6 columns:
+
+| Column Name | Type | Description |
+| ---- | ---- | ---- |
+| `deleted_data_files_count` | BIGINT | Number of deleted data files |
+| `deleted_position_delete_files_count` | BIGINT | Number of deleted position 
delete files |
+| `deleted_equality_delete_files_count` | BIGINT | Number of deleted equality 
delete files |
+| `deleted_manifest_files_count` | BIGINT | Number of deleted manifest files |
+| `deleted_manifest_lists_count` | BIGINT | Number of deleted manifest list 
files |
+| `deleted_statistics_files_count` | BIGINT | Number of deleted statistics 
files |
+
+**Example:**
+
+```sql
+-- Expire snapshots, keeping only the last 2
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("retain_last" = "2");
+
+-- Expire snapshots older than a specific timestamp
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-01-01T00:00:00");
+
+-- Expire specific snapshots by ID
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("snapshot_ids" = "123456789,987654321");
+
+-- Combine parameters: expire snapshots older than 2024-06-01 but keep at 
least the last 5
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-06-01T00:00:00", "retain_last" 
= "5");
+```
+
+**Notes:**
+
+1. This operation does not support WHERE conditions.
+2. If both `older_than` and `retain_last` are specified, both conditions 
apply: only snapshots older than `older_than` AND not within the most recent 
`retain_last` snapshots will be deleted.
+3. `snapshot_ids` can be used alone to delete specific snapshots.
+4. This operation permanently deletes snapshots and their associated data 
files. Use with caution.
+5. It is recommended to query the `$snapshots` system table before execution 
to understand the table's snapshot information.
+
 ### fast_forward
 
 The `fast_forward` operation quickly advances the current snapshot of one 
branch to the latest snapshot of another branch.
diff --git a/docs/lakehouse/file-analysis.md b/docs/lakehouse/file-analysis.md
index d739a51f71f..e3204c35f34 100644
--- a/docs/lakehouse/file-analysis.md
+++ b/docs/lakehouse/file-analysis.md
@@ -1,144 +1,47 @@
 ---
 {
-    "title": "Analyze Files on S3/HDFS",
+    "title": "Analyzing Files on S3/HDFS",
     "language": "en",
-    "description": "Through the Table Value Function feature, Doris can 
directly query and analyze files on object storage or HDFS as a Table."
+    "description": "Learn how to use Apache Doris Table Value Function (TVF) 
to directly query and analyze Parquet, ORC, CSV, and JSON files on storage 
systems like S3 and HDFS, with support for automatic schema inference, 
multi-file matching, and data import."
 }
 ---
 
-Through the Table Value Function feature, Doris can directly query and analyze 
files on object storage or HDFS as a Table. It also supports automatic column 
type inference.
+Through the Table Value Function (TVF) feature, Doris can directly query and 
analyze files on object storage or HDFS as tables without importing data in 
advance, and supports automatic column type inference.
 
-For more usage methods, refer to the Table Value Function documentation:
+## Supported Storage Systems
 
-* [S3](../sql-manual/sql-functions/table-valued-functions/s3.md): Supports 
file analysis on S3-compatible object storage.
+Doris provides the following TVFs for accessing different storage systems:
 
-* [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md): Supports 
file analysis on HDFS.
+| TVF | Supported Storage | Description |
+|-----|-------------------|-------------|
+| [S3](../sql-manual/sql-functions/table-valued-functions/s3.md) | 
S3-compatible object storage | Supports AWS S3, Alibaba Cloud OSS, Tencent 
Cloud COS, etc. |
+| [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md) | HDFS | 
Supports Hadoop Distributed File System |
+| [HTTP](../sql-manual/sql-functions/table-valued-functions/http.md) | HTTP | 
Supports accessing files from HTTP addresses (since version 4.0.2) |
+| [FILE](../sql-manual/sql-functions/table-valued-functions/file.md) | 
S3/HDFS/HTTP/Local | Unified table function supporting multiple storage types 
(since version 3.1.0) |
 
-* [FILE](../sql-manual/sql-functions/table-valued-functions/file.md): Unified 
table function, which can support reading S3/HDFS/Local files at the same time. 
(Supported since version 3.1.0.)
+## Use Cases
 
-## Basic Usage
+### Scenario 1: Direct Query and Analysis of Files
 
-Here we illustrate how to analyze files on object storage using the S3 Table 
Value Function as an example.
+TVF is ideal for directly analyzing files on storage systems without importing 
data into Doris first.
 
-### Query
+The following example queries a Parquet file on object storage using the S3 
TVF:
 
 ```sql
-SELECT * FROM S3 (
+SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 )
+ORDER BY p_partkey LIMIT 5;
 ```
 
-The `S3(...)` is a TVF (Table Value Function). A Table Value Function is 
essentially a table, so it can appear in any SQL statement where a "table" can 
appear.
-
-The attributes of a TVF include the file path to be analyzed, file format, 
connection information of the object storage, etc.
-
-### Multiple File Import
-
-The file path (URI) supports wildcards and range patterns for matching 
multiple files:
-
-| Pattern | Example | Matches |
-|---------|---------|---------|
-| `*` | `file_*` | All files starting with `file_` |
-| `{n..m}` | `file_{1..3}` | `file_1`, `file_2`, `file_3` |
-| `{a,b,c}` | `file_{a,b}` | `file_a`, `file_b` |
-
-For complete syntax including all supported wildcards, range expansion rules, 
and usage examples, see [File Path 
Pattern](../sql-manual/basic-element/file-path-pattern).
-
-
-### Automatic Inference of File Column Types
+Example query result:
 
-You can view the Schema of a TVF using the `DESC FUNCTION` syntax:
-
-```sql
-DESC FUNCTION s3 (
-    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
-    "s3.access_key"= "ak",
-    "s3.secret_key" = "sk",
-    "format" = "parquet",
-    "use_path_style"="true"
-);
-+---------------+--------------+------+-------+---------+-------+
-| Field         | Type         | Null | Key   | Default | Extra |
-+---------------+--------------+------+-------+---------+-------+
-| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
-| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
-| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_size        | INT          | Yes  | false | NULL    | NONE  |
-| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
-| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
-| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
-+---------------+--------------+------+-------+---------+-------+
 ```
-
-Doris infers the Schema based on the following rules:
-
-* For Parquet and ORC formats, Doris obtains the Schema from the file metadata.
-
-* In the case of matching multiple files, the Schema of the first file is used 
as the TVF's Schema.
-
-* For CSV and JSON formats, Doris parses the **first line of data** to obtain 
the Schema based on fields, delimiters, etc.
-
-  By default, all column types are `string`. You can specify column names and 
types individually using the `csv_schema` attribute. Doris will use the 
specified column types for file reading. The format is: 
`name1:type1;name2:type2;...`. For example:
-
-  ```sql
-  S3 (
-      'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-      's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-      's3.region' = 'us-east-1',
-      's3.access_key' = 'ak'
-      's3.secret_key'='sk',
-      'format' = 'csv',
-      'column_separator' = '|',
-      'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
-  )
-  ```
-
-  The currently supported column type names are as follows:
-
-  | Column Type Name |
-  | ------------ |
-  | tinyint      |
-  | smallint     |
-  | int          |
-  | bigint       |
-  | largeint     |
-  | float        |
-  | double       |
-  | decimal(p,s) |
-  | date         |
-  | datetime     |
-  | char         |
-  | varchar      |
-  | string       |
-  | boolean      |
-
-* For columns with mismatched formats (e.g., the file contains a string, but 
the user defines it as `int`; or other files have a different Schema than the 
first file), or missing columns (e.g., the file has 4 columns, but the user 
defines 5 columns), these columns will return `null`.
-
-## Applicable Scenarios
-
-### Query Analysis
-
-TVF is very suitable for directly analyzing independent files on storage 
systems without having to import the data into Doris in advance.
-
-You can use any SQL statement for file analysis, such as:
-
-```sql
-SELECT * FROM s3(
-    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-    'format' = 'parquet',
-    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-    's3.region' = 'us-east-1',
-    's3.access_key' = 'ak',
-    's3.secret_key'='sk'
-)
-ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 | p_partkey | p_name                                   | p_mfgr         | 
p_brand  | p_type                  | p_size | p_container | p_retailprice | 
p_comment           |
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
@@ -150,12 +53,18 @@ ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 ```
 
-TVF can appear in any position in SQL where a Table can appear, such as in the 
`WITH` clause of a `CTE`, in the `FROM` clause, etc. This way, you can treat 
the file as a regular table for any analysis.
+A TVF is essentially a table and can appear anywhere a "table" can appear in 
SQL statements, such as:
 
-You can also create a logical view for a TVF using the `CREATE VIEW` 
statement. After that, you can access this TVF like other views, manage 
permissions, etc., and allow other users to access this View without having to 
repeatedly write connection information and other attributes.
+- In the `FROM` clause
+- In the `WITH` clause of a CTE
+- In `JOIN` statements
+
+### Scenario 2: Creating Views to Simplify Access
+
+You can create logical views for TVFs using the `CREATE VIEW` statement to 
avoid repeatedly writing connection information and to support permission 
management:
 
 ```sql
--- Create a view based on a TVF
+-- Create a view based on TVF
 CREATE VIEW tvf_view AS 
 SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
@@ -163,25 +72,25 @@ SELECT * FROM s3(
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 
--- Describe the view as usual
+-- View the structure of the view
 DESC tvf_view;
 
--- Query the view as usual
+-- Query the view
 SELECT * FROM tvf_view;
 
--- Grant SELECT priv to other user on this view
+-- Grant access to other users
 GRANT SELECT_PRIV ON db.tvf_view TO other_user;
 ```
 
-### Data Import
+### Scenario 3: Importing Data into Doris
 
-TVF can be used as a method for data import into Doris. With the `INSERT INTO 
SELECT` syntax, we can easily import files into Doris.
+Combined with the `INSERT INTO SELECT` syntax, you can import file data into 
Doris tables:
 
 ```sql
--- Create a Doris table
+-- 1. Create the target table
 CREATE TABLE IF NOT EXISTS test_table
 (
     id int,
@@ -191,21 +100,140 @@ CREATE TABLE IF NOT EXISTS test_table
 DISTRIBUTED BY HASH(id) BUCKETS 4
 PROPERTIES("replication_num" = "1");
 
--- 2. Load data into table from TVF
-INSERT INTO test_table (id,name,age)
-SELECT cast(id as INT) as id, name, cast (age as INT) as age
+-- 2. Import data via TVF
+INSERT INTO test_table (id, name, age)
+SELECT cast(id as INT) as id, name, cast(age as INT) as age
 FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 ```
 
-## Notes
+## Core Features
+
+### Multi-File Matching
 
-1. If the specified `uri` does not match any files, or all matched files are 
empty, the TVF will return an empty result set. In this case, using `DESC 
FUNCTION` to view the Schema of this TVF will yield a virtual column 
`__dummy_col`, which is meaningless and only serves as a placeholder.
+The file path (URI) supports using wildcards and range patterns to match 
multiple files:
+
+| Pattern | Example | Match Result |
+|---------|---------|--------------|
+| `*` | `file_*` | All files starting with `file_` |
+| `{n..m}` | `file_{1..3}` | `file_1`, `file_2`, `file_3` |
+| `{a,b,c}` | `file_{a,b}` | `file_a`, `file_b` |
+
+### Using Resource to Simplify Configuration
+
+TVF supports referencing pre-created S3 or HDFS Resources through the 
`resource` property, avoiding the need to repeatedly fill in connection 
information for each query.
+
+**1. Create a Resource**
+
+```sql
+CREATE RESOURCE "s3_resource"
+PROPERTIES
+(
+    "type" = "s3",
+    "s3.endpoint" = "https://s3.us-east-1.amazonaws.com";,
+    "s3.region" = "us-east-1",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "s3.bucket" = "bucket"
+);
+```
+
+**2. Use the Resource in TVF**
+
+```sql
+SELECT * FROM s3(
+    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
+    'format' = 'parquet',
+    'resource' = 's3_resource'
+);
+```
+
+:::tip
+- Properties in the Resource serve as default values; properties specified in 
the TVF will override properties with the same name in the Resource
+- Using Resources enables centralized management of connection information for 
easier maintenance and permission control
+:::
+
+### Automatic Schema Inference
+
+You can view the automatically inferred schema of a TVF using the `DESC 
FUNCTION` syntax:
+
+```sql
+DESC FUNCTION s3 (
+    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "format" = "parquet",
+    "use_path_style" = "true"
+);
+```
+
+```
++---------------+--------------+------+-------+---------+-------+
+| Field         | Type         | Null | Key   | Default | Extra |
++---------------+--------------+------+-------+---------+-------+
+| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
+| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
+| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_size        | INT          | Yes  | false | NULL    | NONE  |
+| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
+| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
+| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
++---------------+--------------+------+-------+---------+-------+
+```
+
+**Schema Inference Rules:**
+
+| File Format | Inference Method |
+|-------------|------------------|
+| Parquet, ORC | Automatically obtains schema from file metadata |
+| CSV, JSON | Parses the first row of data to get the schema; default column 
type is `string` |
+| Multi-file matching | Uses the schema of the first file |
+
+### Manually Specifying Column Types (CSV/JSON)
+
+For CSV and JSON formats, you can manually specify column names and types 
using the `csv_schema` property in the format `name1:type1;name2:type2;...`:
+
+```sql
+S3 (
+    'uri' = 's3://bucket/path/to/tvf_test/test.csv',
+    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
+    's3.region' = 'us-east-1',
+    's3.access_key' = 'ak',
+    's3.secret_key' = 'sk',
+    'format' = 'csv',
+    'column_separator' = '|',
+    'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
+)
+```
+
+**Supported Column Types:**
+
+| Integer Types | Floating-Point Types | Other Types |
+|---------------|----------------------|-------------|
+| tinyint | float | decimal(p,s) |
+| smallint | double | date |
+| int | | datetime |
+| bigint | | char |
+| largeint | | varchar |
+| | | string |
+| | | boolean |
+
+:::note
+- If the column type does not match (e.g., the file contains a string but 
`int` is specified), the column returns `null`
+- If the number of columns does not match (e.g., the file has 4 columns but 5 
are specified), missing columns return `null`
+:::
+
+## Notes
 
-2. If the specified file format is `csv`, and the file read is not empty but 
the first line of the file is empty, an error `The first line is empty, can not 
parse column numbers` will be prompted, as the Schema cannot be parsed from the 
first line of the file.
+| Scenario | Behavior |
+|----------|----------|
+| `uri` matches no files or all files are empty | TVF returns an empty result 
set; using `DESC FUNCTION` to view the schema will show a placeholder column 
`__dummy_col` |
+| First line of CSV file is empty (file is not empty) | Error message: `The 
first line is empty, can not parse column numbers` |
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
index cc5b6d6aadc..24a2d7882fb 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/optimization.md
@@ -36,6 +36,32 @@
 
 可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
 
+## Split 数量限制
+
+当查询外部表（Hive、Iceberg、Paimon 等）时，Doris 会将文件拆分成多个 split 
进行并行处理。在某些场景下，尤其是存在大量小文件时，可能会生成过多的 split，导致：
+
+1. 内存压力：过多的 split 会消耗 FE 大量内存
+2. OOM 问题：split 数量过多可能导致 OutOfMemoryError
+3. 性能下降：管理过多 split 会增加查询规划开销
+
+可以通过 `max_file_split_num` 会话变量来限制每个 table scan 允许的最大 split 数量（该参数自 4.0.4 版本支持）：
+
+- 类型：`int`
+- 默认值：`100000`
+- 说明：在非 batch 模式下，每个 table scan 最大允许的 split 数量，防止产生过多 split 导致 OOM。
+
+使用示例：
+
+```sql
+-- 设置最大 split 数量为 50000
+SET max_file_split_num = 50000;
+
+-- 禁用该限制（设置为 0 或负数）
+SET max_file_split_num = 0;
+```
+
+当设置了该限制后，Doris 会动态计算最小的 split 大小，以确保 split 数量不超过设定的上限。
+
 ## Merge IO 优化
 
 针对 HDFS、对象存储等远端存储系统，Doris 会通过 Merge IO 技术来优化 IO 访问。Merge IO 技术，本质上是将多个相邻的小 IO 
请求，合并成一个大 IO 请求，这样可以减少 IOPS，增加 IO 吞吐。
@@ -71,3 +97,51 @@ Request Range: [0, 50]
 - `merge_io_read_slice_size_bytes`
 
     会话变量，自 3.1.3 版本支持。默认为 8MB。如果发现读放大严重，可以将此参数调小，如 64KB。并观察修改后的 IO 
请求和查询延迟是否有提升。
+
+## Parquet Page Cache
+
+:::info
+自 4.1.0 版本支持。
+:::
+
+Parquet Page Cache 是针对 Parquet 文件的页级缓存机制。该功能与 Doris 现有的 Page Cache 
框架集成，通过在内存中缓存解压后（或压缩的）数据页，显著提升查询性能。
+
+### 主要特性
+
+1. **统一的 Page Cache 集成**
+    - 与 Doris 内表使用的 `StoragePageCache` 共享同一个基础框架
+    - 共享内存池和淘汰策略
+    - 复用现有的缓存统计和 RuntimeProfile 进行统一的性能监控
+
+2. **智能缓存策略**
+    - **压缩比感知**：根据 `parquet_page_cache_decompress_threshold` 
参数自动选择缓存压缩数据还是解压后的数据
+    - **灵活的存储方式**：当 `解压后大小 / 压缩大小 ≤ 阈值` 时缓存解压后的数据，否则根据 
`enable_parquet_cache_compressed_pages` 决定是否缓存压缩数据
+    - **缓存键设计**：使用 `file_path::mtime::offset` 作为缓存键，确保文件修改后缓存的一致性
+
+### 相关配置参数
+
+以下为 BE 配置参数：
+
+- `enable_parquet_page_cache`
+
+    是否启用 Parquet Page Cache 功能。默认为 `false`。
+
+- `parquet_page_cache_decompress_threshold`
+
+    控制缓存压缩数据还是解压数据的阈值。默认为 `1.5`。当 `解压后大小 / 压缩大小` 的比值小于或等于该阈值时，会缓存解压后的数据；否则会根据 
`enable_parquet_cache_compressed_pages` 的设置决定是否缓存压缩数据。
+
+- `enable_parquet_cache_compressed_pages`
+
+    当压缩比超过阈值时，是否缓存压缩的数据页。默认为 `true`。
+
+### 性能监控
+
+通过 Query Profile 可以查看 Parquet Page Cache 的使用情况：
+
+```
+ParquetPageCache:
+    - PageCacheHitCount: 1024
+    - PageCacheMissCount: 128
+```
+
+其中 `PageCacheHitCount` 表示缓存命中次数，`PageCacheMissCount` 表示缓存未命中次数。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx
index faaccff715d..d3298103cd4 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2149,6 +2149,70 @@ EXECUTE cherrypick_snapshot ("snapshot_id" = 
"123456789");
 2. 如果指定的快照不存在，操作会失败。
 3. 合并操作会创建一个新的快照，不会删除原始快照。
 
+### expire_snapshots
+
+`expire_snapshots` 操作用于删除 Iceberg 表的旧快照，以释放存储空间并提高元数据性能。该操作遵循 Apache Iceberg 
Spark 过程规范。
+
+> 支持版本：4.1.0+
+
+**语法：**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE expire_snapshots ("key1" = "value1", "key2" = "value2", ...)
+```
+
+**参数说明：**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `older_than` | String | 否 | 快照过期的时间阈值，早于该时间的快照将被删除。支持 ISO 日期时间格式（如 
`2024-01-01T00:00:00`）或毫秒时间戳格式 |
+| `retain_last` | Integer | 否 | 保留的祖先快照数量。当单独指定时，自动将 `older_than` 设置为当前时间 |
+| `snapshot_ids` | String | 否 | 要过期的特定快照 ID 列表，以逗号分隔 |
+| `max_concurrent_deletes` | Integer | 否 | 执行删除操作的线程池大小 |
+| `clean_expired_metadata` | Boolean | 否 | 设置为 `true` 时，清理未使用的分区规格和 Schema |
+
+**返回值：**
+
+执行 `expire_snapshots` 操作会返回一个结果集，包含以下 6 列：
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `deleted_data_files_count` | BIGINT | 已删除的数据文件数量 |
+| `deleted_position_delete_files_count` | BIGINT | 已删除的 Position Delete 文件数量 |
+| `deleted_equality_delete_files_count` | BIGINT | 已删除的 Equality Delete 文件数量 |
+| `deleted_manifest_files_count` | BIGINT | 已删除的 Manifest 文件数量 |
+| `deleted_manifest_lists_count` | BIGINT | 已删除的 Manifest List 文件数量 |
+| `deleted_statistics_files_count` | BIGINT | 已删除的统计文件数量 |
+
+**示例：**
+
+```sql
+-- 过期快照，只保留最近的 2 个
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("retain_last" = "2");
+
+-- 过期指定时间之前的快照
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-01-01T00:00:00");
+
+-- 过期指定 ID 的快照
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("snapshot_ids" = "123456789,987654321");
+
+-- 组合参数：过期 2024-06-01 之前的快照，但至少保留最近的 5 个
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-06-01T00:00:00", "retain_last" 
= "5");
+```
+
+**注意事项：**
+
+1. 该操作不支持 WHERE 条件。
+2. 如果同时指定 `older_than` 和 `retain_last`，则两个条件都会生效：只有早于 `older_than` 且不在最近 
`retain_last` 个快照中的快照才会被删除。
+3. `snapshot_ids` 可以单独使用，用于删除特定的快照。
+4. 该操作会永久删除快照及其关联的数据文件，请谨慎使用。
+5. 建议在执行前先查询 `$snapshots` 系统表了解表的快照信息。
+
 ### fast_forward
 
 `fast_forward` 操作用于将一个分支的当前快照快速推进到另一个分支的最新快照。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-analysis.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-analysis.md
index 900942cdb14..7ca8d06013c 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-analysis.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-analysis.md
@@ -2,143 +2,46 @@
 {
     "title": "分析 S3/HDFS 上的文件",
     "language": "zh-CN",
-    "description": "通过 Table Value Function 功能，Doris 可以直接将对象存储或 HDFS 上的文件作为 
Table 进行查询分析。并且支持自动的列类型推断。"
+    "description": "了解如何使用 Apache Doris Table Value Function (TVF) 直接查询和分析 
S3、HDFS 等存储系统上的 Parquet、ORC、CSV、JSON 文件，支持自动 Schema 推断、多文件匹配和数据导入。"
 }
 ---
 
-通过 Table Value Function 功能，Doris 可以直接将对象存储或 HDFS 上的文件作为 Table 
进行查询分析。并且支持自动的列类型推断。
+通过 Table Value Function（TVF）功能，Doris 可以直接将对象存储或 HDFS 
上的文件作为表进行查询分析，无需事先导入数据，并且支持自动的列类型推断。
 
-更多使用方式可参阅 Table Value Function 文档：
+## 支持的存储系统
 
-* [S3](../sql-manual/sql-functions/table-valued-functions/s3.md)：支持 S3 
兼容的对象存储上的文件分析。
+Doris 提供以下 TVF 用于访问不同的存储系统：
 
-* [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md)：支持 HDFS 
上的文件分析。
+| TVF | 支持的存储 | 说明 |
+|-----|-----------|------|
+| [S3](../sql-manual/sql-functions/table-valued-functions/s3.md) | S3 兼容的对象存储 
| 支持 AWS S3、阿里云 OSS、腾讯云 COS 等 |
+| [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md) | HDFS | 
支持 Hadoop 分布式文件系统 |
+| [HTTP](../sql-manual/sql-functions/table-valued-functions/http.md) | HTTP | 
支持从 HTTP 地址访问文件（自 4.0.2 版本起） |
+| [FILE](../sql-manual/sql-functions/table-valued-functions/file.md) | 
S3/HDFS/HTTP/Local | 统一表函数，支持多种存储（自 3.1.0 版本起） |
 
-* 
[FILE](../sql-manual/sql-functions/table-valued-functions/file.md)：统一表函数，可以同时支持 
S3/HDFS/Local 文件的读取。（自 3.1.0 版本支持。）
+## 使用场景
 
-## 基础使用
+### 场景一：直接查询分析文件
 
-这里我们通过 S3 Table Value Function 举例说明如何对对象存储上的文件进行分析。
+TVF 非常适用于对存储系统上的文件进行直接分析，无需事先将数据导入到 Doris 中。
 
-### 查询
+以下示例通过 S3 TVF 查询对象存储上的 Parquet 文件：
 
 ```sql
-SELECT * FROM S3 (
+SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 )
+ORDER BY p_partkey LIMIT 5;
 ```
 
-其中 `S3(...)`是一个 TVF（Table Value Function）。Table Value Function 
本质上是一张表，因此他可以出现在任意 SQL 语句中“表”可以出现的位置上。
-
-TVF 的属性包括要分析的文件路径，文件格式、对象存储的连接信息等。
-
-### 多文件导入
-
-文件路径（URI）支持使用通配符和范围模式匹配多个文件：
-
-| 模式 | 示例 | 匹配 |
-|------|------|------|
-| `*` | `file_*` | 所有以 `file_` 开头的文件 |
-| `{n..m}` | `file_{1..3}` | `file_1`、`file_2`、`file_3` |
-| `{a,b,c}` | `file_{a,b}` | `file_a`、`file_b` |
-
-完整语法包括所有支持的通配符、范围展开规则和使用示例，请参阅[文件路径模式](../sql-manual/basic-element/file-path-pattern)。
-
+查询结果示例：
 
-### 自动推断文件列类型
-
-可以通过 `DESC FUNCTION` 语法可以查看 TVF 的 Schema：
-
-```sql
-DESC FUNCTION s3 (
-    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
-    "s3.access_key"= "ak",
-    "s3.secret_key" = "sk",
-    "format" = "parquet",
-    "use_path_style"="true"
-);
-+---------------+--------------+------+-------+---------+-------+
-| Field         | Type         | Null | Key   | Default | Extra |
-+---------------+--------------+------+-------+---------+-------+
-| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
-| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
-| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_size        | INT          | Yes  | false | NULL    | NONE  |
-| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
-| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
-| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
-+---------------+--------------+------+-------+---------+-------+
 ```
-
-Doris 根据以下规则推断 Schema：
-
-* 对于 Parquet、ORC 格式，Doris 会根据文件元信息获取 Schema。
-
-* 对于匹配多个文件的情况，会使用第一个文件的 Schema 作为 TVF 的 Schema。
-
-* 对于 CSV、JSON 格式，Doris 会根据字段、分隔符等属性，解析**第一行数据**获取 Schema。
-
-  默认情况下，所有列类型均为 `string`。可以通过 `csv_schema` 属性单独指定列名和列类型。Doris 
会使用指定的列类型进行文件读取。格式如下：`name1:type1;name2:type2;...`。如：
-
-  ```sql
-  S3 (
-      'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-      's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-      's3.region' = 'us-east-1',
-      's3.access_key' = 'ak'
-      's3.secret_key'='sk',
-      'format' = 'csv',
-      'column_separator' = '|',
-      'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
-  )
-  ```
-
-  当前支持的列类型名称如下：
-
-  | 列类型名称        |
-  | ------------ |
-  | tinyint      |
-  | smallint     |
-  | int          |
-  | bigint       |
-  | largeint     |
-  | float        |
-  | double       |
-  | decimal(p,s) |
-  | date         |
-  | datetime     |
-  | char         |
-  | varchar      |
-  | string       |
-  | boolean      |
-
-* 对于格式不匹配的列（比如文件中为字符串，用户定义为 `int`；或者其他文件和第一个文件的 Schema 不相同），或缺失列（比如文件中有 4 
列，用户定义了 5 列），则这些列将返回 `null`。
-
-## 适用场景
-
-### 查询分析
-
-TVF 非常适用于对存储系统上的独立文件进行直接分析，而无需事先将数据导入到 Doris 中。
-
-可以使用任意的 SQL 语句进行文件分析，如：
-
-```sql
-SELECT * FROM s3(
-    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-    'format' = 'parquet',
-    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-    's3.region' = 'us-east-1',
-    's3.access_key' = 'ak',
-    's3.secret_key'='sk'
-)
-ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 | p_partkey | p_name                                   | p_mfgr         | 
p_brand  | p_type                  | p_size | p_container | p_retailprice | 
p_comment           |
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
@@ -150,12 +53,18 @@ ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 ```
 
-TVF 可以出现在 SQL 中，Table 能出现的任意位置。如 `CTE` 的 `WITH` 子句中，`FROM` 
子句中等等。这样，您可以把文件当做一张普通的表进行任意分析。
+TVF 本质上是一张表，可以出现在任意 SQL 语句中"表"可以出现的位置，如：
 
-您也可以用过 `CREATE VIEW` 语句为 TVF 创建一个逻辑视图。之后，可以像其他视图一样，对这个 TVF 
进行访问、权限管理等操作，也可以让其他用户访问这个 View，而无需重复书写连接信息等属性。
+- `FROM` 子句中
+- `CTE` 的 `WITH` 子句中
+- `JOIN` 语句中
+
+### 场景二：创建视图简化访问
+
+通过 `CREATE VIEW` 语句可以为 TVF 创建逻辑视图，避免重复书写连接信息，并支持权限管理：
 
 ```sql
--- Create a view based on a TVF
+-- 基于 TVF 创建视图
 CREATE VIEW tvf_view AS 
 SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
@@ -163,25 +72,25 @@ SELECT * FROM s3(
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 
--- Describe the view as usual
+-- 查看视图结构
 DESC tvf_view;
 
--- Query the view as usual
+-- 查询视图
 SELECT * FROM tvf_view;
 
--- Grant SELECT priv to other user on this view
+-- 授权其他用户访问
 GRANT SELECT_PRIV ON db.tvf_view TO other_user;
 ```
 
-### 数据导入
+### 场景三：导入数据到 Doris
 
-TVF 可以作为 Doris 数据导入方式的一种。配合 `INSERT INTO SELECT` 语法，我们可以很方便的将文件导入到 Doris 中。
+配合 `INSERT INTO SELECT` 语法，可以将文件数据导入到 Doris 表中：
 
 ```sql
--- Create a Doris table
+-- 1. 创建目标表
 CREATE TABLE IF NOT EXISTS test_table
 (
     id int,
@@ -191,21 +100,142 @@ CREATE TABLE IF NOT EXISTS test_table
 DISTRIBUTED BY HASH(id) BUCKETS 4
 PROPERTIES("replication_num" = "1");
 
--- 2. Load data into table from TVF
-INSERT INTO test_table (id,name,age)
-SELECT cast(id as INT) as id, name, cast (age as INT) as age
+-- 2. 通过 TVF 导入数据
+INSERT INTO test_table (id, name, age)
+SELECT cast(id as INT) as id, name, cast(age as INT) as age
 FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 ```
 
-## 注意事项
+## 核心功能
+
+### 多文件匹配
+
+文件路径（URI）支持使用通配符和范围模式匹配多个文件：
+
+| 模式 | 示例 | 匹配结果 |
+|------|------|----------|
+| `*` | `file_*` | 所有以 `file_` 开头的文件 |
+| `{n..m}` | `file_{1..3}` | `file_1`、`file_2`、`file_3` |
+| `{a,b,c}` | `file_{a,b}` | `file_a`、`file_b` |
+
+完整语法请参阅[文件路径模式](../sql-manual/basic-element/file-path-pattern)。
+
+### 使用 Resource 简化配置
+
+TVF 支持通过 `resource` 属性引用预先创建的 S3 或 HDFS Resource，从而避免在每次查询时重复填写连接信息。
+
+**1. 创建 Resource**
+
+```sql
+CREATE RESOURCE "s3_resource"
+PROPERTIES
+(
+    "type" = "s3",
+    "s3.endpoint" = "https://s3.us-east-1.amazonaws.com";,
+    "s3.region" = "us-east-1",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "s3.bucket" = "bucket"
+);
+```
+
+**2. 在 TVF 中使用 Resource**
+
+```sql
+SELECT * FROM s3(
+    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
+    'format' = 'parquet',
+    'resource' = 's3_resource'
+);
+```
+
+:::tip
+- Resource 中的属性会作为默认值，TVF 中指定的属性会覆盖 Resource 中的同名属性
+- 使用 Resource 可以集中管理连接信息，便于维护和权限控制
+:::
+
+### 自动推断 Schema
+
+通过 `DESC FUNCTION` 语法可以查看 TVF 自动推断的 Schema：
+
+```sql
+DESC FUNCTION s3 (
+    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "format" = "parquet",
+    "use_path_style" = "true"
+);
+```
 
-1. 如果指定的 `uri` 匹配不到文件，或者匹配到的所有文件都是空文件，那么 TVF 将会返回空结果集。在这种情况下使用`DESC 
FUNCTION`查看这个 TVF 的 Schema，会得到一列虚拟的列`__dummy_col`，该列无意义，仅作为占位符使用。
+```
++---------------+--------------+------+-------+---------+-------+
+| Field         | Type         | Null | Key   | Default | Extra |
++---------------+--------------+------+-------+---------+-------+
+| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
+| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
+| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_size        | INT          | Yes  | false | NULL    | NONE  |
+| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
+| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
+| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
++---------------+--------------+------+-------+---------+-------+
+```
+
+**Schema 推断规则：**
+
+| 文件格式 | 推断方式 |
+|----------|----------|
+| Parquet、ORC | 根据文件元信息自动获取 Schema |
+| CSV、JSON | 解析第一行数据获取 Schema，默认列类型为 `string` |
+| 多文件匹配 | 使用第一个文件的 Schema |
+
+### 手动指定列类型（CSV/JSON）
+
+对于 CSV 和 JSON 格式，可以通过 `csv_schema` 属性手动指定列名和列类型，格式为 
`name1:type1;name2:type2;...`：
+
+```sql
+S3 (
+    'uri' = 's3://bucket/path/to/tvf_test/test.csv',
+    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
+    's3.region' = 'us-east-1',
+    's3.access_key' = 'ak',
+    's3.secret_key' = 'sk',
+    'format' = 'csv',
+    'column_separator' = '|',
+    'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
+)
+```
+
+**支持的列类型：**
+
+| 整数类型 | 浮点类型 | 其他类型 |
+|----------|----------|----------|
+| tinyint | float | decimal(p,s) |
+| smallint | double | date |
+| int | | datetime |
+| bigint | | char |
+| largeint | | varchar |
+| | | string |
+| | | boolean |
+
+:::note
+- 如果列类型不匹配（如文件中为字符串，但指定为 `int`），该列返回 `null`
+- 如果列数量不匹配（如文件有 4 列，但指定了 5 列），缺失的列返回 `null`
+:::
+
+## 注意事项
 
-2. 如果指定的文件格式为 `csv`，所读文件不为空文件但文件第一行为空，则会提示错误`The first line is empty, can not 
parse column numbers`，这是因为无法通过该文件的第一行解析出 Schema。
+| 场景 | 行为 |
+|------|------|
+| `uri` 匹配不到文件或所有文件为空 | TVF 返回空结果集；使用 `DESC FUNCTION` 查看 Schema 会得到占位列 
`__dummy_col` |
+| CSV 文件第一行为空（文件非空） | 提示错误 `The first line is empty, can not parse column 
numbers` |
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
index cc5b6d6aadc..24a2d7882fb 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/optimization.md
@@ -36,6 +36,32 @@
 
 可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
 
+## Split 数量限制
+
+当查询外部表（Hive、Iceberg、Paimon 等）时，Doris 会将文件拆分成多个 split 
进行并行处理。在某些场景下，尤其是存在大量小文件时，可能会生成过多的 split，导致：
+
+1. 内存压力：过多的 split 会消耗 FE 大量内存
+2. OOM 问题：split 数量过多可能导致 OutOfMemoryError
+3. 性能下降：管理过多 split 会增加查询规划开销
+
+可以通过 `max_file_split_num` 会话变量来限制每个 table scan 允许的最大 split 数量（该参数自 4.0.4 版本支持）：
+
+- 类型：`int`
+- 默认值：`100000`
+- 说明：在非 batch 模式下，每个 table scan 最大允许的 split 数量，防止产生过多 split 导致 OOM。
+
+使用示例：
+
+```sql
+-- 设置最大 split 数量为 50000
+SET max_file_split_num = 50000;
+
+-- 禁用该限制（设置为 0 或负数）
+SET max_file_split_num = 0;
+```
+
+当设置了该限制后，Doris 会动态计算最小的 split 大小，以确保 split 数量不超过设定的上限。
+
 ## Merge IO 优化
 
 针对 HDFS、对象存储等远端存储系统，Doris 会通过 Merge IO 技术来优化 IO 访问。Merge IO 技术，本质上是将多个相邻的小 IO 
请求，合并成一个大 IO 请求，这样可以减少 IOPS，增加 IO 吞吐。
@@ -71,3 +97,51 @@ Request Range: [0, 50]
 - `merge_io_read_slice_size_bytes`
 
     会话变量，自 3.1.3 版本支持。默认为 8MB。如果发现读放大严重，可以将此参数调小，如 64KB。并观察修改后的 IO 
请求和查询延迟是否有提升。
+
+## Parquet Page Cache
+
+:::info
+自 4.1.0 版本支持。
+:::
+
+Parquet Page Cache 是针对 Parquet 文件的页级缓存机制。该功能与 Doris 现有的 Page Cache 
框架集成，通过在内存中缓存解压后（或压缩的）数据页，显著提升查询性能。
+
+### 主要特性
+
+1. **统一的 Page Cache 集成**
+    - 与 Doris 内表使用的 `StoragePageCache` 共享同一个基础框架
+    - 共享内存池和淘汰策略
+    - 复用现有的缓存统计和 RuntimeProfile 进行统一的性能监控
+
+2. **智能缓存策略**
+    - **压缩比感知**：根据 `parquet_page_cache_decompress_threshold` 
参数自动选择缓存压缩数据还是解压后的数据
+    - **灵活的存储方式**：当 `解压后大小 / 压缩大小 ≤ 阈值` 时缓存解压后的数据，否则根据 
`enable_parquet_cache_compressed_pages` 决定是否缓存压缩数据
+    - **缓存键设计**：使用 `file_path::mtime::offset` 作为缓存键，确保文件修改后缓存的一致性
+
+### 相关配置参数
+
+以下为 BE 配置参数：
+
+- `enable_parquet_page_cache`
+
+    是否启用 Parquet Page Cache 功能。默认为 `false`。
+
+- `parquet_page_cache_decompress_threshold`
+
+    控制缓存压缩数据还是解压数据的阈值。默认为 `1.5`。当 `解压后大小 / 压缩大小` 的比值小于或等于该阈值时，会缓存解压后的数据；否则会根据 
`enable_parquet_cache_compressed_pages` 的设置决定是否缓存压缩数据。
+
+- `enable_parquet_cache_compressed_pages`
+
+    当压缩比超过阈值时，是否缓存压缩的数据页。默认为 `true`。
+
+### 性能监控
+
+通过 Query Profile 可以查看 Parquet Page Cache 的使用情况：
+
+```
+ParquetPageCache:
+    - PageCacheHitCount: 1024
+    - PageCacheMissCount: 128
+```
+
+其中 `PageCacheHitCount` 表示缓存命中次数，`PageCacheMissCount` 表示缓存未命中次数。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
index faaccff715d..d3298103cd4 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2149,6 +2149,70 @@ EXECUTE cherrypick_snapshot ("snapshot_id" = 
"123456789");
 2. 如果指定的快照不存在，操作会失败。
 3. 合并操作会创建一个新的快照，不会删除原始快照。
 
+### expire_snapshots
+
+`expire_snapshots` 操作用于删除 Iceberg 表的旧快照，以释放存储空间并提高元数据性能。该操作遵循 Apache Iceberg 
Spark 过程规范。
+
+> 支持版本：4.1.0+
+
+**语法：**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE expire_snapshots ("key1" = "value1", "key2" = "value2", ...)
+```
+
+**参数说明：**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `older_than` | String | 否 | 快照过期的时间阈值，早于该时间的快照将被删除。支持 ISO 日期时间格式（如 
`2024-01-01T00:00:00`）或毫秒时间戳格式 |
+| `retain_last` | Integer | 否 | 保留的祖先快照数量。当单独指定时，自动将 `older_than` 设置为当前时间 |
+| `snapshot_ids` | String | 否 | 要过期的特定快照 ID 列表，以逗号分隔 |
+| `max_concurrent_deletes` | Integer | 否 | 执行删除操作的线程池大小 |
+| `clean_expired_metadata` | Boolean | 否 | 设置为 `true` 时，清理未使用的分区规格和 Schema |
+
+**返回值：**
+
+执行 `expire_snapshots` 操作会返回一个结果集，包含以下 6 列：
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `deleted_data_files_count` | BIGINT | 已删除的数据文件数量 |
+| `deleted_position_delete_files_count` | BIGINT | 已删除的 Position Delete 文件数量 |
+| `deleted_equality_delete_files_count` | BIGINT | 已删除的 Equality Delete 文件数量 |
+| `deleted_manifest_files_count` | BIGINT | 已删除的 Manifest 文件数量 |
+| `deleted_manifest_lists_count` | BIGINT | 已删除的 Manifest List 文件数量 |
+| `deleted_statistics_files_count` | BIGINT | 已删除的统计文件数量 |
+
+**示例：**
+
+```sql
+-- 过期快照，只保留最近的 2 个
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("retain_last" = "2");
+
+-- 过期指定时间之前的快照
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-01-01T00:00:00");
+
+-- 过期指定 ID 的快照
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("snapshot_ids" = "123456789,987654321");
+
+-- 组合参数：过期 2024-06-01 之前的快照，但至少保留最近的 5 个
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-06-01T00:00:00", "retain_last" 
= "5");
+```
+
+**注意事项：**
+
+1. 该操作不支持 WHERE 条件。
+2. 如果同时指定 `older_than` 和 `retain_last`，则两个条件都会生效：只有早于 `older_than` 且不在最近 
`retain_last` 个快照中的快照才会被删除。
+3. `snapshot_ids` 可以单独使用，用于删除特定的快照。
+4. 该操作会永久删除快照及其关联的数据文件，请谨慎使用。
+5. 建议在执行前先查询 `$snapshots` 系统表了解表的快照信息。
+
 ### fast_forward
 
 `fast_forward` 操作用于将一个分支的当前快照快速推进到另一个分支的最新快照。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/file-analysis.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/file-analysis.md
index 1bee20a8db1..59d417ff105 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/file-analysis.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/file-analysis.md
@@ -2,150 +2,44 @@
 {
     "title": "分析 S3/HDFS 上的文件",
     "language": "zh-CN",
-    "description": "通过 Table Value Function 功能，Doris 可以直接将对象存储或 HDFS 上的文件作为 
Table 进行查询分析。并且支持自动的列类型推断。"
+    "description": "了解如何使用 Apache Doris Table Value Function (TVF) 直接查询和分析 
S3、HDFS 等存储系统上的 Parquet、ORC、CSV、JSON 文件，支持自动 Schema 推断、多文件匹配和数据导入。"
 }
 ---
 
-通过 Table Value Function 功能，Doris 可以直接将对象存储或 HDFS 上的文件作为 Table 
进行查询分析。并且支持自动的列类型推断。
+通过 Table Value Function（TVF）功能，Doris 可以直接将对象存储或 HDFS 
上的文件作为表进行查询分析，无需事先导入数据，并且支持自动的列类型推断。
 
-更多使用方式可参阅 Table Value Function 文档：
+## 支持的存储系统
 
-* [S3](../sql-manual/sql-functions/table-valued-functions/s3.md)：支持 S3 
兼容的对象存储上的文件分析。
+Doris 提供以下 TVF 用于访问不同的存储系统：
 
-* [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md)：支持 HDFS 
上的文件分析。
+| TVF | 支持的存储 | 说明 |
+|-----|-----------|------|
+| [S3](../sql-manual/sql-functions/table-valued-functions/s3.md) | S3 兼容的对象存储 
| 支持 AWS S3、阿里云 OSS、腾讯云 COS 等 |
+| [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md) | HDFS | 
支持 Hadoop 分布式文件系统 |
 
-* [FILE]：请参考 4.x 文档。
+## 使用场景
 
-## 基础使用
+### 场景一：直接查询分析文件
 
-这里我们通过 S3 Table Value Function 举例说明如何对对象存储上的文件进行分析。
+TVF 非常适用于对存储系统上的文件进行直接分析，无需事先将数据导入到 Doris 中。
 
-### 查询
+以下示例通过 S3 TVF 查询对象存储上的 Parquet 文件：
 
 ```sql
-SELECT * FROM S3 (
+SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 )
+ORDER BY p_partkey LIMIT 5;
 ```
 
-其中 `S3(...)`是一个 TVF（Table Value Function）。Table Value Function 
本质上是一张表，因此他可以出现在任意 SQL 语句中“表”可以出现的位置上。
-
-TVF 的属性包括要分析的文件路径，文件格式、对象存储的连接信息等。其中文件路径（URI）可以使用通配符匹配多个文件，以下的文件路径都是合法的：
-
-* 匹配指定的文件
-
-  `s3://bucket/path/to/tvf_test/test.parquet`
-
-* 匹配所有 `test_` 开头的文件
-
-  `s3://bucket/path/to/tvf_test/test_*`
-
-* 匹配所有 `.parquet` 后缀的文件
+查询结果示例：
 
-  `s3://bucket/path/to/tvf_test/*.parquet`
-
-* 匹配 `tvf_test`目录下的所有文件
-
-  `s3://bucket/path/to/tvf_test/*`
-
-* 匹配文件名中包含 `test`的文件
-
-  `s3://bucket/path/to/tvf_test/*test*`
-
-### 自动推断文件列类型
-
-可以通过 `DESC FUNCTION` 语法可以查看 TVF 的 Schema：
-
-```sql
-DESC FUNCTION s3 (
-    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
-    "s3.access_key"= "ak",
-    "s3.secret_key" = "sk",
-    "format" = "parquet",
-    "use_path_style"="true"
-);
-+---------------+--------------+------+-------+---------+-------+
-| Field         | Type         | Null | Key   | Default | Extra |
-+---------------+--------------+------+-------+---------+-------+
-| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
-| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
-| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_size        | INT          | Yes  | false | NULL    | NONE  |
-| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
-| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
-| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
-+---------------+--------------+------+-------+---------+-------+
 ```
-
-Doris 根据以下规则推断 Schema：
-
-* 对于 Parquet、ORC 格式，Doris 会根据文件元信息获取 Schema。
-
-* 对于匹配多个文件的情况，会使用第一个文件的 Schema 作为 TVF 的 Schema。
-
-* 对于 CSV、JSON 格式，Doris 会根据字段、分隔符等属性，解析**第一行数据**获取 Schema。
-
-  默认情况下，所有列类型均为 `string`。可以通过 `csv_schema` 属性单独指定列名和列类型。Doris 
会使用指定的列类型进行文件读取。格式如下：`name1:type1;name2:type2;...`。如：
-
-  ```sql
-  S3 (
-      'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-      's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-      's3.region' = 'us-east-1',
-      's3.access_key' = 'ak'
-      's3.secret_key'='sk',
-      'format' = 'csv',
-      'column_separator' = '|',
-      'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
-  )
-  ```
-
-  当前支持的列类型名称如下：
-
-  | 列类型名称        |
-  | ------------ |
-  | tinyint      |
-  | smallint     |
-  | int          |
-  | bigint       |
-  | largeint     |
-  | float        |
-  | double       |
-  | decimal(p,s) |
-  | date         |
-  | datetime     |
-  | char         |
-  | varchar      |
-  | string       |
-  | boolean      |
-
-* 对于格式不匹配的列（比如文件中为字符串，用户定义为 `int`；或者其他文件和第一个文件的 Schema 不相同），或缺失列（比如文件中有 4 
列，用户定义了 5 列），则这些列将返回 `null`。
-
-## 适用场景
-
-### 查询分析
-
-TVF 非常适用于对存储系统上的独立文件进行直接分析，而无需事先将数据导入到 Doris 中。
-
-可以使用任意的 SQL 语句进行文件分析，如：
-
-```sql
-SELECT * FROM s3(
-    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-    'format' = 'parquet',
-    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-    's3.region' = 'us-east-1',
-    's3.access_key' = 'ak',
-    's3.secret_key'='sk'
-)
-ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 | p_partkey | p_name                                   | p_mfgr         | 
p_brand  | p_type                  | p_size | p_container | p_retailprice | 
p_comment           |
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
@@ -157,12 +51,18 @@ ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 ```
 
-TVF 可以出现在 SQL 中，Table 能出现的任意位置。如 `CTE` 的 `WITH` 子句中，`FROM` 
子句中等等。这样，您可以把文件当做一张普通的表进行任意分析。
+TVF 本质上是一张表，可以出现在任意 SQL 语句中"表"可以出现的位置，如：
+
+- `FROM` 子句中
+- `CTE` 的 `WITH` 子句中
+- `JOIN` 语句中
 
-您也可以用过 `CREATE VIEW` 语句为 TVF 创建一个逻辑视图。之后，可以像其他视图一样，对这个 TVF 
进行访问、权限管理等操作，也可以让其他用户访问这个 View，而无需重复书写连接信息等属性。
+### 场景二：创建视图简化访问
+
+通过 `CREATE VIEW` 语句可以为 TVF 创建逻辑视图，避免重复书写连接信息，并支持权限管理：
 
 ```sql
--- Create a view based on a TVF
+-- 基于 TVF 创建视图
 CREATE VIEW tvf_view AS 
 SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
@@ -170,25 +70,25 @@ SELECT * FROM s3(
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 
--- Describe the view as usual
+-- 查看视图结构
 DESC tvf_view;
 
--- Query the view as usual
+-- 查询视图
 SELECT * FROM tvf_view;
 
--- Grant SELECT priv to other user on this view
+-- 授权其他用户访问
 GRANT SELECT_PRIV ON db.tvf_view TO other_user;
 ```
 
-### 数据导入
+### 场景三：导入数据到 Doris
 
-TVF 可以作为 Doris 数据导入方式的一种。配合 `INSERT INTO SELECT` 语法，我们可以很方便的将文件导入到 Doris 中。
+配合 `INSERT INTO SELECT` 语法，可以将文件数据导入到 Doris 表中：
 
 ```sql
--- Create a Doris table
+-- 1. 创建目标表
 CREATE TABLE IF NOT EXISTS test_table
 (
     id int,
@@ -198,21 +98,140 @@ CREATE TABLE IF NOT EXISTS test_table
 DISTRIBUTED BY HASH(id) BUCKETS 4
 PROPERTIES("replication_num" = "1");
 
--- 2. Load data into table from TVF
-INSERT INTO test_table (id,name,age)
-SELECT cast(id as INT) as id, name, cast (age as INT) as age
+-- 2. 通过 TVF 导入数据
+INSERT INTO test_table (id, name, age)
+SELECT cast(id as INT) as id, name, cast(age as INT) as age
 FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 ```
 
-## 注意事项
+## 核心功能
+
+### 多文件匹配
 
-1. 如果指定的 `uri` 匹配不到文件，或者匹配到的所有文件都是空文件，那么 TVF 将会返回空结果集。在这种情况下使用`DESC 
FUNCTION`查看这个 TVF 的 Schema，会得到一列虚拟的列`__dummy_col`，该列无意义，仅作为占位符使用。
+文件路径（URI）支持使用通配符和范围模式匹配多个文件：
+
+| 模式 | 示例 | 匹配结果 |
+|------|------|----------|
+| `*` | `file_*` | 所有以 `file_` 开头的文件 |
+| `{n..m}` | `file_{1..3}` | `file_1`、`file_2`、`file_3` |
+| `{a,b,c}` | `file_{a,b}` | `file_a`、`file_b` |
+
+### 使用 Resource 简化配置
+
+TVF 支持通过 `resource` 属性引用预先创建的 S3 或 HDFS Resource，从而避免在每次查询时重复填写连接信息。
+
+**1. 创建 Resource**
+
+```sql
+CREATE RESOURCE "s3_resource"
+PROPERTIES
+(
+    "type" = "s3",
+    "s3.endpoint" = "https://s3.us-east-1.amazonaws.com";,
+    "s3.region" = "us-east-1",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "s3.bucket" = "bucket"
+);
+```
+
+**2. 在 TVF 中使用 Resource**
+
+```sql
+SELECT * FROM s3(
+    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
+    'format' = 'parquet',
+    'resource' = 's3_resource'
+);
+```
+
+:::tip
+- Resource 中的属性会作为默认值，TVF 中指定的属性会覆盖 Resource 中的同名属性
+- 使用 Resource 可以集中管理连接信息，便于维护和权限控制
+:::
+
+### 自动推断 Schema
+
+通过 `DESC FUNCTION` 语法可以查看 TVF 自动推断的 Schema：
+
+```sql
+DESC FUNCTION s3 (
+    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "format" = "parquet",
+    "use_path_style" = "true"
+);
+```
+
+```
++---------------+--------------+------+-------+---------+-------+
+| Field         | Type         | Null | Key   | Default | Extra |
++---------------+--------------+------+-------+---------+-------+
+| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
+| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
+| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_size        | INT          | Yes  | false | NULL    | NONE  |
+| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
+| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
+| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
++---------------+--------------+------+-------+---------+-------+
+```
+
+**Schema 推断规则：**
+
+| 文件格式 | 推断方式 |
+|----------|----------|
+| Parquet、ORC | 根据文件元信息自动获取 Schema |
+| CSV、JSON | 解析第一行数据获取 Schema，默认列类型为 `string` |
+| 多文件匹配 | 使用第一个文件的 Schema |
+
+### 手动指定列类型（CSV/JSON）
+
+对于 CSV 和 JSON 格式，可以通过 `csv_schema` 属性手动指定列名和列类型，格式为 
`name1:type1;name2:type2;...`：
+
+```sql
+S3 (
+    'uri' = 's3://bucket/path/to/tvf_test/test.csv',
+    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
+    's3.region' = 'us-east-1',
+    's3.access_key' = 'ak',
+    's3.secret_key' = 'sk',
+    'format' = 'csv',
+    'column_separator' = '|',
+    'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
+)
+```
+
+**支持的列类型：**
+
+| 整数类型 | 浮点类型 | 其他类型 |
+|----------|----------|----------|
+| tinyint | float | decimal(p,s) |
+| smallint | double | date |
+| int | | datetime |
+| bigint | | char |
+| largeint | | varchar |
+| | | string |
+| | | boolean |
+
+:::note
+- 如果列类型不匹配（如文件中为字符串，但指定为 `int`），该列返回 `null`
+- 如果列数量不匹配（如文件有 4 列，但指定了 5 列），缺失的列返回 `null`
+:::
+
+## 注意事项
 
-2. 如果指定的文件格式为 `csv`，所读文件不为空文件但文件第一行为空，则会提示错误`The first line is empty, can not 
parse column numbers`，这是因为无法通过该文件的第一行解析出 Schema。
+| 场景 | 行为 |
+|------|------|
+| `uri` 匹配不到文件或所有文件为空 | TVF 返回空结果集；使用 `DESC FUNCTION` 查看 Schema 会得到占位列 
`__dummy_col` |
+| CSV 文件第一行为空（文件非空） | 提示错误 `The first line is empty, can not parse column 
numbers` |
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
index cc5b6d6aadc..24a2d7882fb 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/optimization.md
@@ -36,6 +36,32 @@
 
 可参考 [HDFS 文档](../storages/hdfs.md) 中 **HDFS IO 优化** 部分。
 
+## Split 数量限制
+
+当查询外部表（Hive、Iceberg、Paimon 等）时，Doris 会将文件拆分成多个 split 
进行并行处理。在某些场景下，尤其是存在大量小文件时，可能会生成过多的 split，导致：
+
+1. 内存压力：过多的 split 会消耗 FE 大量内存
+2. OOM 问题：split 数量过多可能导致 OutOfMemoryError
+3. 性能下降：管理过多 split 会增加查询规划开销
+
+可以通过 `max_file_split_num` 会话变量来限制每个 table scan 允许的最大 split 数量（该参数自 4.0.4 版本支持）：
+
+- 类型：`int`
+- 默认值：`100000`
+- 说明：在非 batch 模式下，每个 table scan 最大允许的 split 数量，防止产生过多 split 导致 OOM。
+
+使用示例：
+
+```sql
+-- 设置最大 split 数量为 50000
+SET max_file_split_num = 50000;
+
+-- 禁用该限制（设置为 0 或负数）
+SET max_file_split_num = 0;
+```
+
+当设置了该限制后，Doris 会动态计算最小的 split 大小，以确保 split 数量不超过设定的上限。
+
 ## Merge IO 优化
 
 针对 HDFS、对象存储等远端存储系统，Doris 会通过 Merge IO 技术来优化 IO 访问。Merge IO 技术，本质上是将多个相邻的小 IO 
请求，合并成一个大 IO 请求，这样可以减少 IOPS，增加 IO 吞吐。
@@ -71,3 +97,51 @@ Request Range: [0, 50]
 - `merge_io_read_slice_size_bytes`
 
     会话变量，自 3.1.3 版本支持。默认为 8MB。如果发现读放大严重，可以将此参数调小，如 64KB。并观察修改后的 IO 
请求和查询延迟是否有提升。
+
+## Parquet Page Cache
+
+:::info
+自 4.1.0 版本支持。
+:::
+
+Parquet Page Cache 是针对 Parquet 文件的页级缓存机制。该功能与 Doris 现有的 Page Cache 
框架集成，通过在内存中缓存解压后（或压缩的）数据页，显著提升查询性能。
+
+### 主要特性
+
+1. **统一的 Page Cache 集成**
+    - 与 Doris 内表使用的 `StoragePageCache` 共享同一个基础框架
+    - 共享内存池和淘汰策略
+    - 复用现有的缓存统计和 RuntimeProfile 进行统一的性能监控
+
+2. **智能缓存策略**
+    - **压缩比感知**：根据 `parquet_page_cache_decompress_threshold` 
参数自动选择缓存压缩数据还是解压后的数据
+    - **灵活的存储方式**：当 `解压后大小 / 压缩大小 ≤ 阈值` 时缓存解压后的数据，否则根据 
`enable_parquet_cache_compressed_pages` 决定是否缓存压缩数据
+    - **缓存键设计**：使用 `file_path::mtime::offset` 作为缓存键，确保文件修改后缓存的一致性
+
+### 相关配置参数
+
+以下为 BE 配置参数：
+
+- `enable_parquet_page_cache`
+
+    是否启用 Parquet Page Cache 功能。默认为 `false`。
+
+- `parquet_page_cache_decompress_threshold`
+
+    控制缓存压缩数据还是解压数据的阈值。默认为 `1.5`。当 `解压后大小 / 压缩大小` 的比值小于或等于该阈值时，会缓存解压后的数据；否则会根据 
`enable_parquet_cache_compressed_pages` 的设置决定是否缓存压缩数据。
+
+- `enable_parquet_cache_compressed_pages`
+
+    当压缩比超过阈值时，是否缓存压缩的数据页。默认为 `true`。
+
+### 性能监控
+
+通过 Query Profile 可以查看 Parquet Page Cache 的使用情况：
+
+```
+ParquetPageCache:
+    - PageCacheHitCount: 1024
+    - PageCacheMissCount: 128
+```
+
+其中 `PageCacheHitCount` 表示缓存命中次数，`PageCacheMissCount` 表示缓存未命中次数。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
index faaccff715d..d3298103cd4 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2149,6 +2149,70 @@ EXECUTE cherrypick_snapshot ("snapshot_id" = 
"123456789");
 2. 如果指定的快照不存在，操作会失败。
 3. 合并操作会创建一个新的快照，不会删除原始快照。
 
+### expire_snapshots
+
+`expire_snapshots` 操作用于删除 Iceberg 表的旧快照，以释放存储空间并提高元数据性能。该操作遵循 Apache Iceberg 
Spark 过程规范。
+
+> 支持版本：4.1.0+
+
+**语法：**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE expire_snapshots ("key1" = "value1", "key2" = "value2", ...)
+```
+
+**参数说明：**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `older_than` | String | 否 | 快照过期的时间阈值，早于该时间的快照将被删除。支持 ISO 日期时间格式（如 
`2024-01-01T00:00:00`）或毫秒时间戳格式 |
+| `retain_last` | Integer | 否 | 保留的祖先快照数量。当单独指定时，自动将 `older_than` 设置为当前时间 |
+| `snapshot_ids` | String | 否 | 要过期的特定快照 ID 列表，以逗号分隔 |
+| `max_concurrent_deletes` | Integer | 否 | 执行删除操作的线程池大小 |
+| `clean_expired_metadata` | Boolean | 否 | 设置为 `true` 时，清理未使用的分区规格和 Schema |
+
+**返回值：**
+
+执行 `expire_snapshots` 操作会返回一个结果集，包含以下 6 列：
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `deleted_data_files_count` | BIGINT | 已删除的数据文件数量 |
+| `deleted_position_delete_files_count` | BIGINT | 已删除的 Position Delete 文件数量 |
+| `deleted_equality_delete_files_count` | BIGINT | 已删除的 Equality Delete 文件数量 |
+| `deleted_manifest_files_count` | BIGINT | 已删除的 Manifest 文件数量 |
+| `deleted_manifest_lists_count` | BIGINT | 已删除的 Manifest List 文件数量 |
+| `deleted_statistics_files_count` | BIGINT | 已删除的统计文件数量 |
+
+**示例：**
+
+```sql
+-- 过期快照，只保留最近的 2 个
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("retain_last" = "2");
+
+-- 过期指定时间之前的快照
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-01-01T00:00:00");
+
+-- 过期指定 ID 的快照
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("snapshot_ids" = "123456789,987654321");
+
+-- 组合参数：过期 2024-06-01 之前的快照，但至少保留最近的 5 个
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-06-01T00:00:00", "retain_last" 
= "5");
+```
+
+**注意事项：**
+
+1. 该操作不支持 WHERE 条件。
+2. 如果同时指定 `older_than` 和 `retain_last`，则两个条件都会生效：只有早于 `older_than` 且不在最近 
`retain_last` 个快照中的快照才会被删除。
+3. `snapshot_ids` 可以单独使用，用于删除特定的快照。
+4. 该操作会永久删除快照及其关联的数据文件，请谨慎使用。
+5. 建议在执行前先查询 `$snapshots` 系统表了解表的快照信息。
+
 ### fast_forward
 
 `fast_forward` 操作用于将一个分支的当前快照快速推进到另一个分支的最新快照。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/file-analysis.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/file-analysis.md
index 900942cdb14..7ca8d06013c 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/file-analysis.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/file-analysis.md
@@ -2,143 +2,46 @@
 {
     "title": "分析 S3/HDFS 上的文件",
     "language": "zh-CN",
-    "description": "通过 Table Value Function 功能，Doris 可以直接将对象存储或 HDFS 上的文件作为 
Table 进行查询分析。并且支持自动的列类型推断。"
+    "description": "了解如何使用 Apache Doris Table Value Function (TVF) 直接查询和分析 
S3、HDFS 等存储系统上的 Parquet、ORC、CSV、JSON 文件，支持自动 Schema 推断、多文件匹配和数据导入。"
 }
 ---
 
-通过 Table Value Function 功能，Doris 可以直接将对象存储或 HDFS 上的文件作为 Table 
进行查询分析。并且支持自动的列类型推断。
+通过 Table Value Function（TVF）功能，Doris 可以直接将对象存储或 HDFS 
上的文件作为表进行查询分析，无需事先导入数据，并且支持自动的列类型推断。
 
-更多使用方式可参阅 Table Value Function 文档：
+## 支持的存储系统
 
-* [S3](../sql-manual/sql-functions/table-valued-functions/s3.md)：支持 S3 
兼容的对象存储上的文件分析。
+Doris 提供以下 TVF 用于访问不同的存储系统：
 
-* [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md)：支持 HDFS 
上的文件分析。
+| TVF | 支持的存储 | 说明 |
+|-----|-----------|------|
+| [S3](../sql-manual/sql-functions/table-valued-functions/s3.md) | S3 兼容的对象存储 
| 支持 AWS S3、阿里云 OSS、腾讯云 COS 等 |
+| [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md) | HDFS | 
支持 Hadoop 分布式文件系统 |
+| [HTTP](../sql-manual/sql-functions/table-valued-functions/http.md) | HTTP | 
支持从 HTTP 地址访问文件（自 4.0.2 版本起） |
+| [FILE](../sql-manual/sql-functions/table-valued-functions/file.md) | 
S3/HDFS/HTTP/Local | 统一表函数，支持多种存储（自 3.1.0 版本起） |
 
-* 
[FILE](../sql-manual/sql-functions/table-valued-functions/file.md)：统一表函数，可以同时支持 
S3/HDFS/Local 文件的读取。（自 3.1.0 版本支持。）
+## 使用场景
 
-## 基础使用
+### 场景一：直接查询分析文件
 
-这里我们通过 S3 Table Value Function 举例说明如何对对象存储上的文件进行分析。
+TVF 非常适用于对存储系统上的文件进行直接分析，无需事先将数据导入到 Doris 中。
 
-### 查询
+以下示例通过 S3 TVF 查询对象存储上的 Parquet 文件：
 
 ```sql
-SELECT * FROM S3 (
+SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 )
+ORDER BY p_partkey LIMIT 5;
 ```
 
-其中 `S3(...)`是一个 TVF（Table Value Function）。Table Value Function 
本质上是一张表，因此他可以出现在任意 SQL 语句中“表”可以出现的位置上。
-
-TVF 的属性包括要分析的文件路径，文件格式、对象存储的连接信息等。
-
-### 多文件导入
-
-文件路径（URI）支持使用通配符和范围模式匹配多个文件：
-
-| 模式 | 示例 | 匹配 |
-|------|------|------|
-| `*` | `file_*` | 所有以 `file_` 开头的文件 |
-| `{n..m}` | `file_{1..3}` | `file_1`、`file_2`、`file_3` |
-| `{a,b,c}` | `file_{a,b}` | `file_a`、`file_b` |
-
-完整语法包括所有支持的通配符、范围展开规则和使用示例，请参阅[文件路径模式](../sql-manual/basic-element/file-path-pattern)。
-
+查询结果示例：
 
-### 自动推断文件列类型
-
-可以通过 `DESC FUNCTION` 语法可以查看 TVF 的 Schema：
-
-```sql
-DESC FUNCTION s3 (
-    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
-    "s3.access_key"= "ak",
-    "s3.secret_key" = "sk",
-    "format" = "parquet",
-    "use_path_style"="true"
-);
-+---------------+--------------+------+-------+---------+-------+
-| Field         | Type         | Null | Key   | Default | Extra |
-+---------------+--------------+------+-------+---------+-------+
-| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
-| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
-| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_size        | INT          | Yes  | false | NULL    | NONE  |
-| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
-| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
-| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
-+---------------+--------------+------+-------+---------+-------+
 ```
-
-Doris 根据以下规则推断 Schema：
-
-* 对于 Parquet、ORC 格式，Doris 会根据文件元信息获取 Schema。
-
-* 对于匹配多个文件的情况，会使用第一个文件的 Schema 作为 TVF 的 Schema。
-
-* 对于 CSV、JSON 格式，Doris 会根据字段、分隔符等属性，解析**第一行数据**获取 Schema。
-
-  默认情况下，所有列类型均为 `string`。可以通过 `csv_schema` 属性单独指定列名和列类型。Doris 
会使用指定的列类型进行文件读取。格式如下：`name1:type1;name2:type2;...`。如：
-
-  ```sql
-  S3 (
-      'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-      's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-      's3.region' = 'us-east-1',
-      's3.access_key' = 'ak'
-      's3.secret_key'='sk',
-      'format' = 'csv',
-      'column_separator' = '|',
-      'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
-  )
-  ```
-
-  当前支持的列类型名称如下：
-
-  | 列类型名称        |
-  | ------------ |
-  | tinyint      |
-  | smallint     |
-  | int          |
-  | bigint       |
-  | largeint     |
-  | float        |
-  | double       |
-  | decimal(p,s) |
-  | date         |
-  | datetime     |
-  | char         |
-  | varchar      |
-  | string       |
-  | boolean      |
-
-* 对于格式不匹配的列（比如文件中为字符串，用户定义为 `int`；或者其他文件和第一个文件的 Schema 不相同），或缺失列（比如文件中有 4 
列，用户定义了 5 列），则这些列将返回 `null`。
-
-## 适用场景
-
-### 查询分析
-
-TVF 非常适用于对存储系统上的独立文件进行直接分析，而无需事先将数据导入到 Doris 中。
-
-可以使用任意的 SQL 语句进行文件分析，如：
-
-```sql
-SELECT * FROM s3(
-    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-    'format' = 'parquet',
-    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-    's3.region' = 'us-east-1',
-    's3.access_key' = 'ak',
-    's3.secret_key'='sk'
-)
-ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 | p_partkey | p_name                                   | p_mfgr         | 
p_brand  | p_type                  | p_size | p_container | p_retailprice | 
p_comment           |
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
@@ -150,12 +53,18 @@ ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 ```
 
-TVF 可以出现在 SQL 中，Table 能出现的任意位置。如 `CTE` 的 `WITH` 子句中，`FROM` 
子句中等等。这样，您可以把文件当做一张普通的表进行任意分析。
+TVF 本质上是一张表，可以出现在任意 SQL 语句中"表"可以出现的位置，如：
 
-您也可以用过 `CREATE VIEW` 语句为 TVF 创建一个逻辑视图。之后，可以像其他视图一样，对这个 TVF 
进行访问、权限管理等操作，也可以让其他用户访问这个 View，而无需重复书写连接信息等属性。
+- `FROM` 子句中
+- `CTE` 的 `WITH` 子句中
+- `JOIN` 语句中
+
+### 场景二：创建视图简化访问
+
+通过 `CREATE VIEW` 语句可以为 TVF 创建逻辑视图，避免重复书写连接信息，并支持权限管理：
 
 ```sql
--- Create a view based on a TVF
+-- 基于 TVF 创建视图
 CREATE VIEW tvf_view AS 
 SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
@@ -163,25 +72,25 @@ SELECT * FROM s3(
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 
--- Describe the view as usual
+-- 查看视图结构
 DESC tvf_view;
 
--- Query the view as usual
+-- 查询视图
 SELECT * FROM tvf_view;
 
--- Grant SELECT priv to other user on this view
+-- 授权其他用户访问
 GRANT SELECT_PRIV ON db.tvf_view TO other_user;
 ```
 
-### 数据导入
+### 场景三：导入数据到 Doris
 
-TVF 可以作为 Doris 数据导入方式的一种。配合 `INSERT INTO SELECT` 语法，我们可以很方便的将文件导入到 Doris 中。
+配合 `INSERT INTO SELECT` 语法，可以将文件数据导入到 Doris 表中：
 
 ```sql
--- Create a Doris table
+-- 1. 创建目标表
 CREATE TABLE IF NOT EXISTS test_table
 (
     id int,
@@ -191,21 +100,142 @@ CREATE TABLE IF NOT EXISTS test_table
 DISTRIBUTED BY HASH(id) BUCKETS 4
 PROPERTIES("replication_num" = "1");
 
--- 2. Load data into table from TVF
-INSERT INTO test_table (id,name,age)
-SELECT cast(id as INT) as id, name, cast (age as INT) as age
+-- 2. 通过 TVF 导入数据
+INSERT INTO test_table (id, name, age)
+SELECT cast(id as INT) as id, name, cast(age as INT) as age
 FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 ```
 
-## 注意事项
+## 核心功能
+
+### 多文件匹配
+
+文件路径（URI）支持使用通配符和范围模式匹配多个文件：
+
+| 模式 | 示例 | 匹配结果 |
+|------|------|----------|
+| `*` | `file_*` | 所有以 `file_` 开头的文件 |
+| `{n..m}` | `file_{1..3}` | `file_1`、`file_2`、`file_3` |
+| `{a,b,c}` | `file_{a,b}` | `file_a`、`file_b` |
+
+完整语法请参阅[文件路径模式](../sql-manual/basic-element/file-path-pattern)。
+
+### 使用 Resource 简化配置
+
+TVF 支持通过 `resource` 属性引用预先创建的 S3 或 HDFS Resource，从而避免在每次查询时重复填写连接信息。
+
+**1. 创建 Resource**
+
+```sql
+CREATE RESOURCE "s3_resource"
+PROPERTIES
+(
+    "type" = "s3",
+    "s3.endpoint" = "https://s3.us-east-1.amazonaws.com";,
+    "s3.region" = "us-east-1",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "s3.bucket" = "bucket"
+);
+```
+
+**2. 在 TVF 中使用 Resource**
+
+```sql
+SELECT * FROM s3(
+    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
+    'format' = 'parquet',
+    'resource' = 's3_resource'
+);
+```
+
+:::tip
+- Resource 中的属性会作为默认值，TVF 中指定的属性会覆盖 Resource 中的同名属性
+- 使用 Resource 可以集中管理连接信息，便于维护和权限控制
+:::
+
+### 自动推断 Schema
+
+通过 `DESC FUNCTION` 语法可以查看 TVF 自动推断的 Schema：
+
+```sql
+DESC FUNCTION s3 (
+    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "format" = "parquet",
+    "use_path_style" = "true"
+);
+```
 
-1. 如果指定的 `uri` 匹配不到文件，或者匹配到的所有文件都是空文件，那么 TVF 将会返回空结果集。在这种情况下使用`DESC 
FUNCTION`查看这个 TVF 的 Schema，会得到一列虚拟的列`__dummy_col`，该列无意义，仅作为占位符使用。
+```
++---------------+--------------+------+-------+---------+-------+
+| Field         | Type         | Null | Key   | Default | Extra |
++---------------+--------------+------+-------+---------+-------+
+| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
+| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
+| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_size        | INT          | Yes  | false | NULL    | NONE  |
+| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
+| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
+| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
++---------------+--------------+------+-------+---------+-------+
+```
+
+**Schema 推断规则：**
+
+| 文件格式 | 推断方式 |
+|----------|----------|
+| Parquet、ORC | 根据文件元信息自动获取 Schema |
+| CSV、JSON | 解析第一行数据获取 Schema，默认列类型为 `string` |
+| 多文件匹配 | 使用第一个文件的 Schema |
+
+### 手动指定列类型（CSV/JSON）
+
+对于 CSV 和 JSON 格式，可以通过 `csv_schema` 属性手动指定列名和列类型，格式为 
`name1:type1;name2:type2;...`：
+
+```sql
+S3 (
+    'uri' = 's3://bucket/path/to/tvf_test/test.csv',
+    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
+    's3.region' = 'us-east-1',
+    's3.access_key' = 'ak',
+    's3.secret_key' = 'sk',
+    'format' = 'csv',
+    'column_separator' = '|',
+    'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
+)
+```
+
+**支持的列类型：**
+
+| 整数类型 | 浮点类型 | 其他类型 |
+|----------|----------|----------|
+| tinyint | float | decimal(p,s) |
+| smallint | double | date |
+| int | | datetime |
+| bigint | | char |
+| largeint | | varchar |
+| | | string |
+| | | boolean |
+
+:::note
+- 如果列类型不匹配（如文件中为字符串，但指定为 `int`），该列返回 `null`
+- 如果列数量不匹配（如文件有 4 列，但指定了 5 列），缺失的列返回 `null`
+:::
+
+## 注意事项
 
-2. 如果指定的文件格式为 `csv`，所读文件不为空文件但文件第一行为空，则会提示错误`The first line is empty, can not 
parse column numbers`，这是因为无法通过该文件的第一行解析出 Schema。
+| 场景 | 行为 |
+|------|------|
+| `uri` 匹配不到文件或所有文件为空 | TVF 返回空结果集；使用 `DESC FUNCTION` 查看 Schema 会得到占位列 
`__dummy_col` |
+| CSV 文件第一行为空（文件非空） | 提示错误 `The first line is empty, can not parse column 
numbers` |
diff --git 
a/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md 
b/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md
index 635415e990d..9aced42cec1 100644
--- a/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md
+++ b/versioned_docs/version-3.x/lakehouse/best-practices/optimization.md
@@ -36,6 +36,32 @@ Since version 4.0.2, cache warmup functionality is 
supported, which can further
 
 Please refer to the **HDFS IO Optimization** section in the [HDFS 
Documentation](../storages/hdfs.md).
 
+## Split Count Limit
+
+When querying external tables (Hive, Iceberg, Paimon, etc.), Doris splits 
files into multiple splits for parallel processing. In some scenarios, 
especially when there are a large number of small files, too many splits may be 
generated, leading to:
+
+1. Memory pressure: Too many splits consume a significant amount of FE memory
+2. OOM issues: Excessive split counts may cause OutOfMemoryError
+3. Performance degradation: Managing too many splits increases query planning 
overhead
+
+You can use the `max_file_split_num` session variable to limit the maximum 
number of splits allowed per table scan (supported since 4.0.4):
+
+- Type: `int`
+- Default: `100000`
+- Description: In non-batch mode, the maximum number of splits allowed per 
table scan to prevent OOM caused by too many splits.
+
+Usage example:
+
+```sql
+-- Set maximum split count to 50000
+SET max_file_split_num = 50000;
+
+-- Disable this limit (set to 0 or negative number)
+SET max_file_split_num = 0;
+```
+
+When this limit is set, Doris dynamically calculates the minimum split size to 
ensure the split count does not exceed the specified limit.
+
 ## Merge IO Optimization
 
 For remote storage systems like HDFS and object storage, Doris optimizes IO 
access through Merge IO technology. Merge IO technology essentially merges 
multiple adjacent small IO requests into one large IO request, which can reduce 
IOPS and increase IO throughput.
@@ -71,3 +97,51 @@ If you find that `MergedBytes` is much larger than 
`RequestBytes`, it indicates
 - `merge_io_read_slice_size_bytes`
 
     Session variable, supported since version 3.1.3. Default is 8MB. If you 
find serious read amplification, you can reduce this parameter, such as to 
64KB, and observe whether the modified IO requests and query latency improve.
+
+## Parquet Page Cache
+
+:::info
+Supported since version 4.1.0.
+:::
+
+Parquet Page Cache is a page-level caching mechanism for Parquet files. This 
feature integrates with Doris's existing Page Cache framework, significantly 
improving query performance by caching decompressed (or compressed) data pages 
in memory.
+
+### Key Features
+
+1. **Unified Page Cache Integration**
+    - Shares the same underlying `StoragePageCache` framework used by Doris 
internal tables
+    - Shares memory pool and eviction policies
+    - Reuses existing cache statistics and RuntimeProfile for unified 
performance monitoring
+
+2. **Intelligent Caching Strategy**
+    - **Compression Ratio Awareness**: Automatically decides whether to cache 
compressed or decompressed data based on the 
`parquet_page_cache_decompress_threshold` parameter
+    - **Flexible Storage Approach**: Caches decompressed data when 
`decompressed size / compressed size ≤ threshold`; otherwise, decides whether 
to cache compressed data based on `enable_parquet_cache_compressed_pages`
+    - **Cache Key Design**: Uses `file_path::mtime::offset` as the cache key 
to ensure cache consistency after file modifications
+
+### Configuration Parameters
+
+The following are BE configuration parameters:
+
+- `enable_parquet_page_cache`
+
+    Whether to enable the Parquet Page Cache feature. Default is `false`.
+
+- `parquet_page_cache_decompress_threshold`
+
+    Threshold that controls whether to cache compressed or decompressed data. 
Default is `1.5`. When the ratio of `decompressed size / compressed size` is 
less than or equal to this threshold, decompressed data will be cached; 
otherwise, it will decide whether to cache compressed data based on the 
`enable_parquet_cache_compressed_pages` setting.
+
+- `enable_parquet_cache_compressed_pages`
+
+    Whether to cache compressed data pages when the compression ratio exceeds 
the threshold. Default is `true`.
+
+### Performance Monitoring
+
+You can view Parquet Page Cache usage through Query Profile:
+
+```
+ParquetPageCache:
+    - PageCacheHitCount: 1024
+    - PageCacheMissCount: 128
+```
+
+Where `PageCacheHitCount` indicates the number of cache hits, and 
`PageCacheMissCount` indicates the number of cache misses.
\ No newline at end of file
diff --git a/versioned_docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx 
b/versioned_docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
index 1034b5eaa35..d874038015b 100644
--- a/versioned_docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
+++ b/versioned_docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2133,6 +2133,70 @@ EXECUTE cherrypick_snapshot ("snapshot_id" = 
"123456789");
 2. The operation will fail if the specified snapshot does not exist
 3. The merge operation creates a new snapshot and does not delete the original 
snapshot
 
+### expire_snapshots
+
+The `expire_snapshots` operation removes old snapshots from Iceberg tables to 
free up storage space and improve metadata performance. This operation follows 
the Apache Iceberg Spark procedure specification.
+
+> Supported version: 4.1.0+
+
+**Syntax:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE expire_snapshots ("key1" = "value1", "key2" = "value2", ...)
+```
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `older_than` | String | No | Timestamp threshold for snapshot expiration. 
Snapshots older than this will be removed. Supports ISO datetime format (e.g., 
`2024-01-01T00:00:00`) or milliseconds timestamp |
+| `retain_last` | Integer | No | Number of ancestor snapshots to preserve. 
When specified alone, automatically sets `older_than` to current time |
+| `snapshot_ids` | String | No | Comma-separated list of specific snapshot IDs 
to expire |
+| `max_concurrent_deletes` | Integer | No | Size of thread pool for delete 
operations |
+| `clean_expired_metadata` | Boolean | No | When set to `true`, cleans up 
unused partition specs and schemas |
+
+**Return Value:**
+
+Executing the `expire_snapshots` operation returns a result set with the 
following 6 columns:
+
+| Column Name | Type | Description |
+| ---- | ---- | ---- |
+| `deleted_data_files_count` | BIGINT | Number of deleted data files |
+| `deleted_position_delete_files_count` | BIGINT | Number of deleted position 
delete files |
+| `deleted_equality_delete_files_count` | BIGINT | Number of deleted equality 
delete files |
+| `deleted_manifest_files_count` | BIGINT | Number of deleted manifest files |
+| `deleted_manifest_lists_count` | BIGINT | Number of deleted manifest list 
files |
+| `deleted_statistics_files_count` | BIGINT | Number of deleted statistics 
files |
+
+**Example:**
+
+```sql
+-- Expire snapshots, keeping only the last 2
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("retain_last" = "2");
+
+-- Expire snapshots older than a specific timestamp
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-01-01T00:00:00");
+
+-- Expire specific snapshots by ID
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("snapshot_ids" = "123456789,987654321");
+
+-- Combine parameters: expire snapshots older than 2024-06-01 but keep at 
least the last 5
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-06-01T00:00:00", "retain_last" 
= "5");
+```
+
+**Notes:**
+
+1. This operation does not support WHERE conditions.
+2. If both `older_than` and `retain_last` are specified, both conditions 
apply: only snapshots older than `older_than` AND not within the most recent 
`retain_last` snapshots will be deleted.
+3. `snapshot_ids` can be used alone to delete specific snapshots.
+4. This operation permanently deletes snapshots and their associated data 
files. Use with caution.
+5. It is recommended to query the `$snapshots` system table before execution 
to understand the table's snapshot information.
+
 ### fast_forward
 
 The `fast_forward` operation quickly advances the current snapshot of one 
branch to the latest snapshot of another branch.
diff --git a/versioned_docs/version-3.x/lakehouse/file-analysis.md 
b/versioned_docs/version-3.x/lakehouse/file-analysis.md
index 793269679ce..968c4a31334 100644
--- a/versioned_docs/version-3.x/lakehouse/file-analysis.md
+++ b/versioned_docs/version-3.x/lakehouse/file-analysis.md
@@ -1,151 +1,45 @@
 ---
 {
-    "title": "Analyze Files on S3/HDFS",
+    "title": "Analyzing Files on S3/HDFS",
     "language": "en",
-    "description": "Through the Table Value Function feature, Doris can 
directly query and analyze files on object storage or HDFS as a Table."
+    "description": "Learn how to use Apache Doris Table Value Function (TVF) 
to directly query and analyze Parquet, ORC, CSV, and JSON files on storage 
systems like S3 and HDFS, with support for automatic schema inference, 
multi-file matching, and data import."
 }
 ---
 
-Through the Table Value Function feature, Doris can directly query and analyze 
files on object storage or HDFS as a Table. It also supports automatic column 
type inference.
+Through the Table Value Function (TVF) feature, Doris can directly query and 
analyze files on object storage or HDFS as tables without importing data in 
advance, and supports automatic column type inference.
 
-For more usage methods, refer to the Table Value Function documentation:
+## Supported Storage Systems
 
-* [S3](../sql-manual/sql-functions/table-valued-functions/s3.md): Supports 
file analysis on S3-compatible object storage.
+Doris provides the following TVFs for accessing different storage systems:
 
-* [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md): Supports 
file analysis on HDFS.
+| TVF | Supported Storage | Description |
+|-----|-------------------|-------------|
+| [S3](../sql-manual/sql-functions/table-valued-functions/s3.md) | 
S3-compatible object storage | Supports AWS S3, Alibaba Cloud OSS, Tencent 
Cloud COS, etc. |
+| [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md) | HDFS | 
Supports Hadoop Distributed File System |
 
-* [FILE]: Please refer to 4.x document.
+## Use Cases
 
-## Basic Usage
+### Scenario 1: Direct Query and Analysis of Files
 
-Here we illustrate how to analyze files on object storage using the S3 Table 
Value Function as an example.
+TVF is ideal for directly analyzing files on storage systems without importing 
data into Doris first.
 
-### Query
+The following example queries a Parquet file on object storage using the S3 
TVF:
 
 ```sql
-SELECT * FROM S3 (
+SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 )
+ORDER BY p_partkey LIMIT 5;
 ```
 
-The `S3(...)` is a TVF (Table Value Function). A Table Value Function is 
essentially a table, so it can appear in any SQL statement where a "table" can 
appear.
-
-The attributes of a TVF include the file path to be analyzed, file format, 
connection information of the object storage, etc. The file path (URI) can use 
wildcards to match multiple files. The following file paths are valid:
-
-* Match a specific file
-
-  `s3://bucket/path/to/tvf_test/test.parquet`
-
-* Match all files starting with `test_`
-
-  `s3://bucket/path/to/tvf_test/test_*`
-
-* Match all files with the `.parquet` suffix
+Example query result:
 
-  `s3://bucket/path/to/tvf_test/*.parquet`
-
-* Match all files in the `tvf_test` directory
-
-  `s3://bucket/path/to/tvf_test/*`
-
-* Match files with `test` in the filename
-
-  `s3://bucket/path/to/tvf_test/*test*`
-
-### Automatic Inference of File Column Types
-
-You can view the Schema of a TVF using the `DESC FUNCTION` syntax:
-
-```sql
-DESC FUNCTION s3 (
-    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
-    "s3.access_key"= "ak",
-    "s3.secret_key" = "sk",
-    "format" = "parquet",
-    "use_path_style"="true"
-);
-+---------------+--------------+------+-------+---------+-------+
-| Field         | Type         | Null | Key   | Default | Extra |
-+---------------+--------------+------+-------+---------+-------+
-| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
-| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
-| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_size        | INT          | Yes  | false | NULL    | NONE  |
-| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
-| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
-| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
-+---------------+--------------+------+-------+---------+-------+
 ```
-
-Doris infers the Schema based on the following rules:
-
-* For Parquet and ORC formats, Doris obtains the Schema from the file metadata.
-
-* In the case of matching multiple files, the Schema of the first file is used 
as the TVF's Schema.
-
-* For CSV and JSON formats, Doris parses the **first line of data** to obtain 
the Schema based on fields, delimiters, etc.
-
-  By default, all column types are `string`. You can specify column names and 
types individually using the `csv_schema` attribute. Doris will use the 
specified column types for file reading. The format is: 
`name1:type1;name2:type2;...`. For example:
-
-  ```sql
-  S3 (
-      'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-      's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-      's3.region' = 'us-east-1',
-      's3.access_key' = 'ak'
-      's3.secret_key'='sk',
-      'format' = 'csv',
-      'column_separator' = '|',
-      'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
-  )
-  ```
-
-  The currently supported column type names are as follows:
-
-  | Column Type Name |
-  | ------------ |
-  | tinyint      |
-  | smallint     |
-  | int          |
-  | bigint       |
-  | largeint     |
-  | float        |
-  | double       |
-  | decimal(p,s) |
-  | date         |
-  | datetime     |
-  | char         |
-  | varchar      |
-  | string       |
-  | boolean      |
-
-* For columns with mismatched formats (e.g., the file contains a string, but 
the user defines it as `int`; or other files have a different Schema than the 
first file), or missing columns (e.g., the file has 4 columns, but the user 
defines 5 columns), these columns will return `null`.
-
-## Applicable Scenarios
-
-### Query Analysis
-
-TVF is very suitable for directly analyzing independent files on storage 
systems without having to import the data into Doris in advance.
-
-You can use any SQL statement for file analysis, such as:
-
-```sql
-SELECT * FROM s3(
-    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-    'format' = 'parquet',
-    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-    's3.region' = 'us-east-1',
-    's3.access_key' = 'ak',
-    's3.secret_key'='sk'
-)
-ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 | p_partkey | p_name                                   | p_mfgr         | 
p_brand  | p_type                  | p_size | p_container | p_retailprice | 
p_comment           |
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
@@ -157,12 +51,18 @@ ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 ```
 
-TVF can appear in any position in SQL where a Table can appear, such as in the 
`WITH` clause of a `CTE`, in the `FROM` clause, etc. This way, you can treat 
the file as a regular table for any analysis.
+A TVF is essentially a table and can appear anywhere a "table" can appear in 
SQL statements, such as:
+
+- In the `FROM` clause
+- In the `WITH` clause of a CTE
+- In `JOIN` statements
+
+### Scenario 2: Creating Views to Simplify Access
 
-You can also create a logical view for a TVF using the `CREATE VIEW` 
statement. After that, you can access this TVF like other views, manage 
permissions, etc., and allow other users to access this View without having to 
repeatedly write connection information and other attributes.
+You can create logical views for TVFs using the `CREATE VIEW` statement to 
avoid repeatedly writing connection information and to support permission 
management:
 
 ```sql
--- Create a view based on a TVF
+-- Create a view based on TVF
 CREATE VIEW tvf_view AS 
 SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
@@ -170,25 +70,25 @@ SELECT * FROM s3(
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 
--- Describe the view as usual
+-- View the structure of the view
 DESC tvf_view;
 
--- Query the view as usual
+-- Query the view
 SELECT * FROM tvf_view;
 
--- Grant SELECT priv to other user on this view
+-- Grant access to other users
 GRANT SELECT_PRIV ON db.tvf_view TO other_user;
 ```
 
-### Data Import
+### Scenario 3: Importing Data into Doris
 
-TVF can be used as a method for data import into Doris. With the `INSERT INTO 
SELECT` syntax, we can easily import files into Doris.
+Combined with the `INSERT INTO SELECT` syntax, you can import file data into 
Doris tables:
 
 ```sql
--- Create a Doris table
+-- 1. Create the target table
 CREATE TABLE IF NOT EXISTS test_table
 (
     id int,
@@ -198,21 +98,142 @@ CREATE TABLE IF NOT EXISTS test_table
 DISTRIBUTED BY HASH(id) BUCKETS 4
 PROPERTIES("replication_num" = "1");
 
--- 2. Load data into table from TVF
-INSERT INTO test_table (id,name,age)
-SELECT cast(id as INT) as id, name, cast (age as INT) as age
+-- 2. Import data via TVF
+INSERT INTO test_table (id, name, age)
+SELECT cast(id as INT) as id, name, cast(age as INT) as age
 FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 ```
 
-## Notes
+## Core Features
+
+### Multi-File Matching
+
+The file path (URI) supports using wildcards and range patterns to match 
multiple files:
 
-1. If the specified `uri` does not match any files, or all matched files are 
empty, the TVF will return an empty result set. In this case, using `DESC 
FUNCTION` to view the Schema of this TVF will yield a virtual column 
`__dummy_col`, which is meaningless and only serves as a placeholder.
+| Pattern | Example | Match Result |
+|---------|---------|--------------|
+| `*` | `file_*` | All files starting with `file_` |
+| `{n..m}` | `file_{1..3}` | `file_1`, `file_2`, `file_3` |
+| `{a,b,c}` | `file_{a,b}` | `file_a`, `file_b` |
+
+For complete syntax, please refer to [File Path 
Pattern](../sql-manual/basic-element/file-path-pattern).
+
+### Using Resource to Simplify Configuration
+
+TVF supports referencing pre-created S3 or HDFS Resources through the 
`resource` property, avoiding the need to repeatedly fill in connection 
information for each query.
+
+**1. Create a Resource**
+
+```sql
+CREATE RESOURCE "s3_resource"
+PROPERTIES
+(
+    "type" = "s3",
+    "s3.endpoint" = "https://s3.us-east-1.amazonaws.com";,
+    "s3.region" = "us-east-1",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "s3.bucket" = "bucket"
+);
+```
+
+**2. Use the Resource in TVF**
+
+```sql
+SELECT * FROM s3(
+    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
+    'format' = 'parquet',
+    'resource' = 's3_resource'
+);
+```
+
+:::tip
+- Properties in the Resource serve as default values; properties specified in 
the TVF will override properties with the same name in the Resource
+- Using Resources enables centralized management of connection information for 
easier maintenance and permission control
+:::
+
+### Automatic Schema Inference
+
+You can view the automatically inferred schema of a TVF using the `DESC 
FUNCTION` syntax:
+
+```sql
+DESC FUNCTION s3 (
+    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "format" = "parquet",
+    "use_path_style" = "true"
+);
+```
+
+```
++---------------+--------------+------+-------+---------+-------+
+| Field         | Type         | Null | Key   | Default | Extra |
++---------------+--------------+------+-------+---------+-------+
+| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
+| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
+| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_size        | INT          | Yes  | false | NULL    | NONE  |
+| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
+| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
+| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
++---------------+--------------+------+-------+---------+-------+
+```
+
+**Schema Inference Rules:**
+
+| File Format | Inference Method |
+|-------------|------------------|
+| Parquet, ORC | Automatically obtains schema from file metadata |
+| CSV, JSON | Parses the first row of data to get the schema; default column 
type is `string` |
+| Multi-file matching | Uses the schema of the first file |
+
+### Manually Specifying Column Types (CSV/JSON)
+
+For CSV and JSON formats, you can manually specify column names and types 
using the `csv_schema` property in the format `name1:type1;name2:type2;...`:
+
+```sql
+S3 (
+    'uri' = 's3://bucket/path/to/tvf_test/test.csv',
+    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
+    's3.region' = 'us-east-1',
+    's3.access_key' = 'ak',
+    's3.secret_key' = 'sk',
+    'format' = 'csv',
+    'column_separator' = '|',
+    'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
+)
+```
+
+**Supported Column Types:**
+
+| Integer Types | Floating-Point Types | Other Types |
+|---------------|----------------------|-------------|
+| tinyint | float | decimal(p,s) |
+| smallint | double | date |
+| int | | datetime |
+| bigint | | char |
+| largeint | | varchar |
+| | | string |
+| | | boolean |
+
+:::note
+- If the column type does not match (e.g., the file contains a string but 
`int` is specified), the column returns `null`
+- If the number of columns does not match (e.g., the file has 4 columns but 5 
are specified), missing columns return `null`
+:::
+
+## Notes
 
-2. If the specified file format is `csv`, and the file read is not empty but 
the first line of the file is empty, an error `The first line is empty, can not 
parse column numbers` will be prompted, as the Schema cannot be parsed from the 
first line of the file.
+| Scenario | Behavior |
+|----------|----------|
+| `uri` matches no files or all files are empty | TVF returns an empty result 
set; using `DESC FUNCTION` to view the schema will show a placeholder column 
`__dummy_col` |
+| First line of CSV file is empty (file is not empty) | Error message: `The 
first line is empty, can not parse column numbers` |
diff --git 
a/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md 
b/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md
index 635415e990d..9aced42cec1 100644
--- a/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md
+++ b/versioned_docs/version-4.x/lakehouse/best-practices/optimization.md
@@ -36,6 +36,32 @@ Since version 4.0.2, cache warmup functionality is 
supported, which can further
 
 Please refer to the **HDFS IO Optimization** section in the [HDFS 
Documentation](../storages/hdfs.md).
 
+## Split Count Limit
+
+When querying external tables (Hive, Iceberg, Paimon, etc.), Doris splits 
files into multiple splits for parallel processing. In some scenarios, 
especially when there are a large number of small files, too many splits may be 
generated, leading to:
+
+1. Memory pressure: Too many splits consume a significant amount of FE memory
+2. OOM issues: Excessive split counts may cause OutOfMemoryError
+3. Performance degradation: Managing too many splits increases query planning 
overhead
+
+You can use the `max_file_split_num` session variable to limit the maximum 
number of splits allowed per table scan (supported since 4.0.4):
+
+- Type: `int`
+- Default: `100000`
+- Description: In non-batch mode, the maximum number of splits allowed per 
table scan to prevent OOM caused by too many splits.
+
+Usage example:
+
+```sql
+-- Set maximum split count to 50000
+SET max_file_split_num = 50000;
+
+-- Disable this limit (set to 0 or negative number)
+SET max_file_split_num = 0;
+```
+
+When this limit is set, Doris dynamically calculates the minimum split size to 
ensure the split count does not exceed the specified limit.
+
 ## Merge IO Optimization
 
 For remote storage systems like HDFS and object storage, Doris optimizes IO 
access through Merge IO technology. Merge IO technology essentially merges 
multiple adjacent small IO requests into one large IO request, which can reduce 
IOPS and increase IO throughput.
@@ -71,3 +97,51 @@ If you find that `MergedBytes` is much larger than 
`RequestBytes`, it indicates
 - `merge_io_read_slice_size_bytes`
 
     Session variable, supported since version 3.1.3. Default is 8MB. If you 
find serious read amplification, you can reduce this parameter, such as to 
64KB, and observe whether the modified IO requests and query latency improve.
+
+## Parquet Page Cache
+
+:::info
+Supported since version 4.1.0.
+:::
+
+Parquet Page Cache is a page-level caching mechanism for Parquet files. This 
feature integrates with Doris's existing Page Cache framework, significantly 
improving query performance by caching decompressed (or compressed) data pages 
in memory.
+
+### Key Features
+
+1. **Unified Page Cache Integration**
+    - Shares the same underlying `StoragePageCache` framework used by Doris 
internal tables
+    - Shares memory pool and eviction policies
+    - Reuses existing cache statistics and RuntimeProfile for unified 
performance monitoring
+
+2. **Intelligent Caching Strategy**
+    - **Compression Ratio Awareness**: Automatically decides whether to cache 
compressed or decompressed data based on the 
`parquet_page_cache_decompress_threshold` parameter
+    - **Flexible Storage Approach**: Caches decompressed data when 
`decompressed size / compressed size ≤ threshold`; otherwise, decides whether 
to cache compressed data based on `enable_parquet_cache_compressed_pages`
+    - **Cache Key Design**: Uses `file_path::mtime::offset` as the cache key 
to ensure cache consistency after file modifications
+
+### Configuration Parameters
+
+The following are BE configuration parameters:
+
+- `enable_parquet_page_cache`
+
+    Whether to enable the Parquet Page Cache feature. Default is `false`.
+
+- `parquet_page_cache_decompress_threshold`
+
+    Threshold that controls whether to cache compressed or decompressed data. 
Default is `1.5`. When the ratio of `decompressed size / compressed size` is 
less than or equal to this threshold, decompressed data will be cached; 
otherwise, it will decide whether to cache compressed data based on the 
`enable_parquet_cache_compressed_pages` setting.
+
+- `enable_parquet_cache_compressed_pages`
+
+    Whether to cache compressed data pages when the compression ratio exceeds 
the threshold. Default is `true`.
+
+### Performance Monitoring
+
+You can view Parquet Page Cache usage through Query Profile:
+
+```
+ParquetPageCache:
+    - PageCacheHitCount: 1024
+    - PageCacheMissCount: 128
+```
+
+Where `PageCacheHitCount` indicates the number of cache hits, and 
`PageCacheMissCount` indicates the number of cache misses.
\ No newline at end of file
diff --git a/versioned_docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx 
b/versioned_docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
index 1034b5eaa35..d874038015b 100644
--- a/versioned_docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
+++ b/versioned_docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2133,6 +2133,70 @@ EXECUTE cherrypick_snapshot ("snapshot_id" = 
"123456789");
 2. The operation will fail if the specified snapshot does not exist
 3. The merge operation creates a new snapshot and does not delete the original 
snapshot
 
+### expire_snapshots
+
+The `expire_snapshots` operation removes old snapshots from Iceberg tables to 
free up storage space and improve metadata performance. This operation follows 
the Apache Iceberg Spark procedure specification.
+
+> Supported version: 4.1.0+
+
+**Syntax:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE expire_snapshots ("key1" = "value1", "key2" = "value2", ...)
+```
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `older_than` | String | No | Timestamp threshold for snapshot expiration. 
Snapshots older than this will be removed. Supports ISO datetime format (e.g., 
`2024-01-01T00:00:00`) or milliseconds timestamp |
+| `retain_last` | Integer | No | Number of ancestor snapshots to preserve. 
When specified alone, automatically sets `older_than` to current time |
+| `snapshot_ids` | String | No | Comma-separated list of specific snapshot IDs 
to expire |
+| `max_concurrent_deletes` | Integer | No | Size of thread pool for delete 
operations |
+| `clean_expired_metadata` | Boolean | No | When set to `true`, cleans up 
unused partition specs and schemas |
+
+**Return Value:**
+
+Executing the `expire_snapshots` operation returns a result set with the 
following 6 columns:
+
+| Column Name | Type | Description |
+| ---- | ---- | ---- |
+| `deleted_data_files_count` | BIGINT | Number of deleted data files |
+| `deleted_position_delete_files_count` | BIGINT | Number of deleted position 
delete files |
+| `deleted_equality_delete_files_count` | BIGINT | Number of deleted equality 
delete files |
+| `deleted_manifest_files_count` | BIGINT | Number of deleted manifest files |
+| `deleted_manifest_lists_count` | BIGINT | Number of deleted manifest list 
files |
+| `deleted_statistics_files_count` | BIGINT | Number of deleted statistics 
files |
+
+**Example:**
+
+```sql
+-- Expire snapshots, keeping only the last 2
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("retain_last" = "2");
+
+-- Expire snapshots older than a specific timestamp
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-01-01T00:00:00");
+
+-- Expire specific snapshots by ID
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("snapshot_ids" = "123456789,987654321");
+
+-- Combine parameters: expire snapshots older than 2024-06-01 but keep at 
least the last 5
+ALTER TABLE iceberg_db.iceberg_table 
+EXECUTE expire_snapshots ("older_than" = "2024-06-01T00:00:00", "retain_last" 
= "5");
+```
+
+**Notes:**
+
+1. This operation does not support WHERE conditions.
+2. If both `older_than` and `retain_last` are specified, both conditions 
apply: only snapshots older than `older_than` AND not within the most recent 
`retain_last` snapshots will be deleted.
+3. `snapshot_ids` can be used alone to delete specific snapshots.
+4. This operation permanently deletes snapshots and their associated data 
files. Use with caution.
+5. It is recommended to query the `$snapshots` system table before execution 
to understand the table's snapshot information.
+
 ### fast_forward
 
 The `fast_forward` operation quickly advances the current snapshot of one 
branch to the latest snapshot of another branch.
diff --git a/versioned_docs/version-4.x/lakehouse/file-analysis.md 
b/versioned_docs/version-4.x/lakehouse/file-analysis.md
index d739a51f71f..d7ac1ea8266 100644
--- a/versioned_docs/version-4.x/lakehouse/file-analysis.md
+++ b/versioned_docs/version-4.x/lakehouse/file-analysis.md
@@ -1,144 +1,47 @@
 ---
 {
-    "title": "Analyze Files on S3/HDFS",
+    "title": "Analyzing Files on S3/HDFS",
     "language": "en",
-    "description": "Through the Table Value Function feature, Doris can 
directly query and analyze files on object storage or HDFS as a Table."
+    "description": "Learn how to use Apache Doris Table Value Function (TVF) 
to directly query and analyze Parquet, ORC, CSV, and JSON files on storage 
systems like S3 and HDFS, with support for automatic schema inference, 
multi-file matching, and data import."
 }
 ---
 
-Through the Table Value Function feature, Doris can directly query and analyze 
files on object storage or HDFS as a Table. It also supports automatic column 
type inference.
+Through the Table Value Function (TVF) feature, Doris can directly query and 
analyze files on object storage or HDFS as tables without importing data in 
advance, and supports automatic column type inference.
 
-For more usage methods, refer to the Table Value Function documentation:
+## Supported Storage Systems
 
-* [S3](../sql-manual/sql-functions/table-valued-functions/s3.md): Supports 
file analysis on S3-compatible object storage.
+Doris provides the following TVFs for accessing different storage systems:
 
-* [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md): Supports 
file analysis on HDFS.
+| TVF | Supported Storage | Description |
+|-----|-------------------|-------------|
+| [S3](../sql-manual/sql-functions/table-valued-functions/s3.md) | 
S3-compatible object storage | Supports AWS S3, Alibaba Cloud OSS, Tencent 
Cloud COS, etc. |
+| [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md) | HDFS | 
Supports Hadoop Distributed File System |
+| [HTTP](../sql-manual/sql-functions/table-valued-functions/http.md) | HTTP | 
Supports accessing files from HTTP addresses (since version 4.0.2) |
+| [FILE](../sql-manual/sql-functions/table-valued-functions/file.md) | 
S3/HDFS/HTTP/Local | Unified table function supporting multiple storage types 
(since version 3.1.0) |
 
-* [FILE](../sql-manual/sql-functions/table-valued-functions/file.md): Unified 
table function, which can support reading S3/HDFS/Local files at the same time. 
(Supported since version 3.1.0.)
+## Use Cases
 
-## Basic Usage
+### Scenario 1: Direct Query and Analysis of Files
 
-Here we illustrate how to analyze files on object storage using the S3 Table 
Value Function as an example.
+TVF is ideal for directly analyzing files on storage systems without importing 
data into Doris first.
 
-### Query
+The following example queries a Parquet file on object storage using the S3 
TVF:
 
 ```sql
-SELECT * FROM S3 (
+SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 )
+ORDER BY p_partkey LIMIT 5;
 ```
 
-The `S3(...)` is a TVF (Table Value Function). A Table Value Function is 
essentially a table, so it can appear in any SQL statement where a "table" can 
appear.
-
-The attributes of a TVF include the file path to be analyzed, file format, 
connection information of the object storage, etc.
-
-### Multiple File Import
-
-The file path (URI) supports wildcards and range patterns for matching 
multiple files:
-
-| Pattern | Example | Matches |
-|---------|---------|---------|
-| `*` | `file_*` | All files starting with `file_` |
-| `{n..m}` | `file_{1..3}` | `file_1`, `file_2`, `file_3` |
-| `{a,b,c}` | `file_{a,b}` | `file_a`, `file_b` |
-
-For complete syntax including all supported wildcards, range expansion rules, 
and usage examples, see [File Path 
Pattern](../sql-manual/basic-element/file-path-pattern).
-
-
-### Automatic Inference of File Column Types
+Example query result:
 
-You can view the Schema of a TVF using the `DESC FUNCTION` syntax:
-
-```sql
-DESC FUNCTION s3 (
-    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
-    "s3.access_key"= "ak",
-    "s3.secret_key" = "sk",
-    "format" = "parquet",
-    "use_path_style"="true"
-);
-+---------------+--------------+------+-------+---------+-------+
-| Field         | Type         | Null | Key   | Default | Extra |
-+---------------+--------------+------+-------+---------+-------+
-| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
-| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
-| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
-| p_size        | INT          | Yes  | false | NULL    | NONE  |
-| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
-| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
-| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
-+---------------+--------------+------+-------+---------+-------+
 ```
-
-Doris infers the Schema based on the following rules:
-
-* For Parquet and ORC formats, Doris obtains the Schema from the file metadata.
-
-* In the case of matching multiple files, the Schema of the first file is used 
as the TVF's Schema.
-
-* For CSV and JSON formats, Doris parses the **first line of data** to obtain 
the Schema based on fields, delimiters, etc.
-
-  By default, all column types are `string`. You can specify column names and 
types individually using the `csv_schema` attribute. Doris will use the 
specified column types for file reading. The format is: 
`name1:type1;name2:type2;...`. For example:
-
-  ```sql
-  S3 (
-      'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-      's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-      's3.region' = 'us-east-1',
-      's3.access_key' = 'ak'
-      's3.secret_key'='sk',
-      'format' = 'csv',
-      'column_separator' = '|',
-      'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
-  )
-  ```
-
-  The currently supported column type names are as follows:
-
-  | Column Type Name |
-  | ------------ |
-  | tinyint      |
-  | smallint     |
-  | int          |
-  | bigint       |
-  | largeint     |
-  | float        |
-  | double       |
-  | decimal(p,s) |
-  | date         |
-  | datetime     |
-  | char         |
-  | varchar      |
-  | string       |
-  | boolean      |
-
-* For columns with mismatched formats (e.g., the file contains a string, but 
the user defines it as `int`; or other files have a different Schema than the 
first file), or missing columns (e.g., the file has 4 columns, but the user 
defines 5 columns), these columns will return `null`.
-
-## Applicable Scenarios
-
-### Query Analysis
-
-TVF is very suitable for directly analyzing independent files on storage 
systems without having to import the data into Doris in advance.
-
-You can use any SQL statement for file analysis, such as:
-
-```sql
-SELECT * FROM s3(
-    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
-    'format' = 'parquet',
-    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
-    's3.region' = 'us-east-1',
-    's3.access_key' = 'ak',
-    's3.secret_key'='sk'
-)
-ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 | p_partkey | p_name                                   | p_mfgr         | 
p_brand  | p_type                  | p_size | p_container | p_retailprice | 
p_comment           |
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
@@ -150,12 +53,18 @@ ORDER BY p_partkey LIMIT 5;
 
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
 ```
 
-TVF can appear in any position in SQL where a Table can appear, such as in the 
`WITH` clause of a `CTE`, in the `FROM` clause, etc. This way, you can treat 
the file as a regular table for any analysis.
+A TVF is essentially a table and can appear anywhere a "table" can appear in 
SQL statements, such as:
 
-You can also create a logical view for a TVF using the `CREATE VIEW` 
statement. After that, you can access this TVF like other views, manage 
permissions, etc., and allow other users to access this View without having to 
repeatedly write connection information and other attributes.
+- In the `FROM` clause
+- In the `WITH` clause of a CTE
+- In `JOIN` statements
+
+### Scenario 2: Creating Views to Simplify Access
+
+You can create logical views for TVFs using the `CREATE VIEW` statement to 
avoid repeatedly writing connection information and to support permission 
management:
 
 ```sql
--- Create a view based on a TVF
+-- Create a view based on TVF
 CREATE VIEW tvf_view AS 
 SELECT * FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
@@ -163,25 +72,25 @@ SELECT * FROM s3(
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 
--- Describe the view as usual
+-- View the structure of the view
 DESC tvf_view;
 
--- Query the view as usual
+-- Query the view
 SELECT * FROM tvf_view;
 
--- Grant SELECT priv to other user on this view
+-- Grant access to other users
 GRANT SELECT_PRIV ON db.tvf_view TO other_user;
 ```
 
-### Data Import
+### Scenario 3: Importing Data into Doris
 
-TVF can be used as a method for data import into Doris. With the `INSERT INTO 
SELECT` syntax, we can easily import files into Doris.
+Combined with the `INSERT INTO SELECT` syntax, you can import file data into 
Doris tables:
 
 ```sql
--- Create a Doris table
+-- 1. Create the target table
 CREATE TABLE IF NOT EXISTS test_table
 (
     id int,
@@ -191,21 +100,142 @@ CREATE TABLE IF NOT EXISTS test_table
 DISTRIBUTED BY HASH(id) BUCKETS 4
 PROPERTIES("replication_num" = "1");
 
--- 2. Load data into table from TVF
-INSERT INTO test_table (id,name,age)
-SELECT cast(id as INT) as id, name, cast (age as INT) as age
+-- 2. Import data via TVF
+INSERT INTO test_table (id, name, age)
+SELECT cast(id as INT) as id, name, cast(age as INT) as age
 FROM s3(
     'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
     'format' = 'parquet',
     's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
     's3.region' = 'us-east-1',
     's3.access_key' = 'ak',
-    's3.secret_key'='sk'
+    's3.secret_key' = 'sk'
 );
 ```
 
-## Notes
+## Core Features
+
+### Multi-File Matching
+
+The file path (URI) supports using wildcards and range patterns to match 
multiple files:
+
+| Pattern | Example | Match Result |
+|---------|---------|--------------|
+| `*` | `file_*` | All files starting with `file_` |
+| `{n..m}` | `file_{1..3}` | `file_1`, `file_2`, `file_3` |
+| `{a,b,c}` | `file_{a,b}` | `file_a`, `file_b` |
+
+For complete syntax, please refer to [File Path 
Pattern](../sql-manual/basic-element/file-path-pattern).
+
+### Using Resource to Simplify Configuration
+
+TVF supports referencing pre-created S3 or HDFS Resources through the 
`resource` property, avoiding the need to repeatedly fill in connection 
information for each query.
+
+**1. Create a Resource**
+
+```sql
+CREATE RESOURCE "s3_resource"
+PROPERTIES
+(
+    "type" = "s3",
+    "s3.endpoint" = "https://s3.us-east-1.amazonaws.com";,
+    "s3.region" = "us-east-1",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "s3.bucket" = "bucket"
+);
+```
+
+**2. Use the Resource in TVF**
+
+```sql
+SELECT * FROM s3(
+    'uri' = 's3://bucket/path/to/tvf_test/test.parquet',
+    'format' = 'parquet',
+    'resource' = 's3_resource'
+);
+```
+
+:::tip
+- Properties in the Resource serve as default values; properties specified in 
the TVF will override properties with the same name in the Resource
+- Using Resources enables centralized management of connection information for 
easier maintenance and permission control
+:::
+
+### Automatic Schema Inference
+
+You can view the automatically inferred schema of a TVF using the `DESC 
FUNCTION` syntax:
+
+```sql
+DESC FUNCTION s3 (
+    "URI" = "s3://bucket/path/to/tvf_test/test.parquet",
+    "s3.access_key" = "ak",
+    "s3.secret_key" = "sk",
+    "format" = "parquet",
+    "use_path_style" = "true"
+);
+```
 
-1. If the specified `uri` does not match any files, or all matched files are 
empty, the TVF will return an empty result set. In this case, using `DESC 
FUNCTION` to view the Schema of this TVF will yield a virtual column 
`__dummy_col`, which is meaningless and only serves as a placeholder.
+```
++---------------+--------------+------+-------+---------+-------+
+| Field         | Type         | Null | Key   | Default | Extra |
++---------------+--------------+------+-------+---------+-------+
+| p_partkey     | INT          | Yes  | false | NULL    | NONE  |
+| p_name        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_mfgr        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_brand       | TEXT         | Yes  | false | NULL    | NONE  |
+| p_type        | TEXT         | Yes  | false | NULL    | NONE  |
+| p_size        | INT          | Yes  | false | NULL    | NONE  |
+| p_container   | TEXT         | Yes  | false | NULL    | NONE  |
+| p_retailprice | DECIMAL(9,0) | Yes  | false | NULL    | NONE  |
+| p_comment     | TEXT         | Yes  | false | NULL    | NONE  |
++---------------+--------------+------+-------+---------+-------+
+```
+
+**Schema Inference Rules:**
+
+| File Format | Inference Method |
+|-------------|------------------|
+| Parquet, ORC | Automatically obtains schema from file metadata |
+| CSV, JSON | Parses the first row of data to get the schema; default column 
type is `string` |
+| Multi-file matching | Uses the schema of the first file |
+
+### Manually Specifying Column Types (CSV/JSON)
+
+For CSV and JSON formats, you can manually specify column names and types 
using the `csv_schema` property in the format `name1:type1;name2:type2;...`:
+
+```sql
+S3 (
+    'uri' = 's3://bucket/path/to/tvf_test/test.csv',
+    's3.endpoint' = 'https://s3.us-east-1.amazonaws.com',
+    's3.region' = 'us-east-1',
+    's3.access_key' = 'ak',
+    's3.secret_key' = 'sk',
+    'format' = 'csv',
+    'column_separator' = '|',
+    'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)'
+)
+```
+
+**Supported Column Types:**
+
+| Integer Types | Floating-Point Types | Other Types |
+|---------------|----------------------|-------------|
+| tinyint | float | decimal(p,s) |
+| smallint | double | date |
+| int | | datetime |
+| bigint | | char |
+| largeint | | varchar |
+| | | string |
+| | | boolean |
+
+:::note
+- If the column type does not match (e.g., the file contains a string but 
`int` is specified), the column returns `null`
+- If the number of columns does not match (e.g., the file has 4 columns but 5 
are specified), missing columns return `null`
+:::
+
+## Notes
 
-2. If the specified file format is `csv`, and the file read is not empty but 
the first line of the file is empty, an error `The first line is empty, can not 
parse column numbers` will be prompted, as the Schema cannot be parsed from the 
first line of the file.
+| Scenario | Behavior |
+|----------|----------|
+| `uri` matches no files or all files are empty | TVF returns an empty result 
set; using `DESC FUNCTION` to view the schema will show a placeholder column 
`__dummy_col` |
+| First line of CSV file is empty (file is not empty) | Error message: `The 
first line is empty, can not parse column numbers` |


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [opt] add more lakehouse related doc (#3386)

Reply via email to