tsreaper opened a new pull request, #3816:
URL: https://github.com/apache/paimon/pull/3816

   ### Purpose
   
   In this PR we have supported reading partitions from fallback branch.
   
   You can set the table option `scan.fallback-branch` so that when a batch job 
reads from the current branch, if a partition does not exist, the reader will 
try to read this partition from the fallback branch. For streaming read jobs, 
this feature is currently not supported, and will only produce results from the 
current branch.
   
   What's the use case of this feature? Say you have created a Paimon table 
partitioned by date. You have a long-running streaming job which inserts 
records into Paimon, so that today's data can be queried in time. You also have 
a batch job which runs at every night to insert corrected records of yesterday 
into Paimon, so that the preciseness of the data can be promised.
   
   When you query from this Paimon table, you would like to first read from the 
results of batch job. But if a partition (for example, today's partition) does 
not exist in its result, then you would like to read from the results of 
streaming job. In this case, you can create a branch for streaming job, and set 
`scan.fallback-branch` to this streaming branch.
   
   Let's look at an example.
   
   ```sql
   -- create Paimon table
   CREATE TABLE T (
       dt STRING NOT NULL,
       name STRING NOT NULL,
       amount BIGINT
   ) PARTITIONED BY (dt);
   
   -- create a branch for streaming job
   -- you can even specify primary keys and bucket number, even if the original 
branch has no primary key
   -- in this example, we create a new branch `test`, uses `dt` and `name` as 
primary keys, has a bucket number of 2, and will copy the table options from 
the original branch
   CALL sys.create_branch('default.T', 'test', 'dt, name', 2, true);
   
   -- set fallback branch
   ALTER TABLE T SET (
       'scan.fallback-branch' = 'test'
   );
   
   -- set changelog producer for the streaming branch, in case a streaming job 
would like to read from it in the future
   ALTER TABLE `T$branch_test` SET (
       'changelog-producer' = 'lookup'
   );
   
   -- write records into the streaming branch
   INSERT INTO `T$branch_test` VALUES ('20240725', 'apple', 4), ('20240726', 
'cherry', 3), ('20240726', 'pear', 6);
   
   -- write records into the default branch
   INSERT INTO T VALUES ('20240725', 'apple', 5), ('20240725', 'banana', 7);
   
   SELECT * FROM T;
   /*
   +------------------+------------------+--------+
   |               dt |             name | amount |
   +------------------+------------------+--------+
   |         20240725 |            apple |      5 |
   |         20240725 |           banana |      7 |
   |         20240726 |           cherry |      3 |
   |         20240726 |             pear |      6 |
   +------------------+------------------+--------+
   */
   ```
   
   ### Tests
   
   IT cases.
   
   ### API and Format
   
   No changes in format.
   
   ### Documentation
   
   Yes. Document is also updated.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to