This is an automated email from the ASF dual-hosted git repository. lzljs3620320 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git
commit 89c508db8f53e31f608409fb2723be24dafe074c Author: Jingsong <[email protected]> AuthorDate: Wed Aug 9 11:09:11 2023 +0800 [doc] Document Spark dynamic overwritten --- docs/content/how-to/querying-tables.md | 5 ++++ docs/content/how-to/writing-tables.md | 54 +++++++++++++++++++++++++++++----- 2 files changed, 51 insertions(+), 8 deletions(-) diff --git a/docs/content/how-to/querying-tables.md b/docs/content/how-to/querying-tables.md index 5bc490e55..e4d3051b5 100644 --- a/docs/content/how-to/querying-tables.md +++ b/docs/content/how-to/querying-tables.md @@ -318,6 +318,11 @@ Run the following command: {{< /tabs >}} +### Read Overwrite + +Streaming reading will ignore the commits generated by `INSERT OVERWRITE` by default. If you want to read the +commits of `OVERWRITE`, you can configure `streaming-read-overwrite`. + ## Query Optimization {{< label Batch >}}{{< label Streaming >}} diff --git a/docs/content/how-to/writing-tables.md b/docs/content/how-to/writing-tables.md index 65897617c..86b27b78f 100644 --- a/docs/content/how-to/writing-tables.md +++ b/docs/content/how-to/writing-tables.md @@ -79,13 +79,6 @@ For more information, please check the syntax document: [Spark INSERT Statement](https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-table.html) -### Overwrite Semantic - -1. Streaming reading will ignore the commits generated by `INSERT OVERWRITE` by default. If you want to read the -commits of `OVERWRITE`, you can configure `streaming-read-overwrite`. -2. For partitioned table, Paimon's default overwrite mode is dynamic partition overwrite (that means Paimon only -deletes the partitions appear in the overwrite data). You can configure `dynamic-partition-overwrite` to change it. - ### Write Nullable field to Not-null field We cannot insert into a non-null column of one table with a nullable column of another table. Assume that, @@ -95,7 +88,7 @@ which is nullable. If we run a sql like this: ``` sql INSERT INTO A key1 SELECT key2 FROM B ``` -We will catch a exception, +We will catch an exception, - In spark: "Cannot write nullable values to non-null column 'key1'." - In flink: "Column 'key1' is NOT NULL, however, a null value is being written into it. " @@ -186,6 +179,51 @@ INSERT OVERWRITE MyTable PARTITION (key1 = value1, key2 = value2, ...) SELECT .. {{< /tabs >}} +## Dynamic Overwrite + +{{< tabs "dynamic-overwrite" >}} + +{{< tab "Flink" >}} + +Flink's default overwrite mode is dynamic partition overwrite (that means Paimon only deletes the partitions +appear in the overwritten data). You can configure `dynamic-partition-overwrite` to change it to static overwritten. + +```sql +-- MyTable is a Partitioned Table + +-- Dynamic overwrite +INSERT OVERWRITE MyTable SELECT ... + +-- Static overwrite (Overwrite whole table) +INSERT OVERWRITE MyTable /*+ OPTIONS('dynamic-partition-overwrite' = 'false') */ SELECT ... +``` + +{{< /tab >}} + +{{< tab "Spark" >}} + +Spark's default overwrite mode is static partition overwrite. To enable dynamic overwritten needs these configs below: + +```text +--conf spark.sql.catalog.spark_catalog=org.apache.paimon.spark.SparkGenericCatalog +--conf spark.sql.extensions=org.apache.paimon.spark.PaimonSparkSessionExtension +``` + +```sql +-- MyTable is a Partitioned Table + +-- Static overwrite (Overwrite whole table) +INSERT OVERWRITE MyTable SELECT ... + +-- Dynamic overwrite +SET spark.sql.sources.partitionOverwriteMode=dynamic; +INSERT OVERWRITE MyTable SELECT ... +``` + +{{< /tab >}} + +{{< /tabs >}} + ## Purging tables You can use `INSERT OVERWRITE` to purge tables by inserting empty value.
