JingsongLi opened a new issue, #2125:
URL: https://github.com/apache/incubator-paimon/issues/2125

   ### Search before asking
   
   - [X] I searched in the 
[issues](https://github.com/apache/incubator-paimon/issues) and found nothing 
similar.
   
   
   ### Motivation
   
   When using primary key tables, an unpartitioned approach is often used to 
maintain updates, in order to mirror and synchronize tables from upstream 
database tables. This allows users to query the latest data.
   
   However, the tradition of Hive data warehouses is not like this. Offline 
data warehouses require an immutable view every day to ensure the idempotence 
of calculations. So we created a Tag mechanism to output these views.
   
   However, the traditional use of Hive data warehouses is more accustomed to 
using partitions to specify the query's Tag, and is more accustomed to using 
Hive computing engines.
   
   So, we are considering mapping a non partitioned primary key table to the 
partition table in Hive metastore, and mapping the partition field to the name 
of the Tag to be fully compatible with Hive.
   
   ### Solution
   
   Subtasks:
   1. Introduce `metastore.tag-to-partition-field`, option type string, the 
field name for partition field, for Hive metastore, it will create a partition 
field to represent tag.
   2. Flink & Spark and other engines can not see the partition field, because 
they use the schema in File System.
   3. Before Hive engine query it, we should create Partition to metastore when 
we create Tag for the table.
       a. This requires Tag callback mechanism.
   4. After partitions created, In `PaimonInputFormat.getSplits`, the locations 
will contains all partitions, we should convert these locations into Tags. And 
generate splits for tags.
   5. In `PaimonInputFormat.getRecordReader`, we should generate partition 
field for the rows, so we should put tag information into `PaimonInputSplit` 
too.
   6. This table can not be written from Hive compute engine.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to