JingsongLi opened a new issue, #2125: URL: https://github.com/apache/incubator-paimon/issues/2125
### Search before asking - [X] I searched in the [issues](https://github.com/apache/incubator-paimon/issues) and found nothing similar. ### Motivation When using primary key tables, an unpartitioned approach is often used to maintain updates, in order to mirror and synchronize tables from upstream database tables. This allows users to query the latest data. However, the tradition of Hive data warehouses is not like this. Offline data warehouses require an immutable view every day to ensure the idempotence of calculations. So we created a Tag mechanism to output these views. However, the traditional use of Hive data warehouses is more accustomed to using partitions to specify the query's Tag, and is more accustomed to using Hive computing engines. So, we are considering mapping a non partitioned primary key table to the partition table in Hive metastore, and mapping the partition field to the name of the Tag to be fully compatible with Hive. ### Solution Subtasks: 1. Introduce `metastore.tag-to-partition-field`, option type string, the field name for partition field, for Hive metastore, it will create a partition field to represent tag. 2. Flink & Spark and other engines can not see the partition field, because they use the schema in File System. 3. Before Hive engine query it, we should create Partition to metastore when we create Tag for the table. a. This requires Tag callback mechanism. 4. After partitions created, In `PaimonInputFormat.getSplits`, the locations will contains all partitions, we should convert these locations into Tags. And generate splits for tags. 5. In `PaimonInputFormat.getRecordReader`, we should generate partition field for the rows, so we should put tag information into `PaimonInputSplit` too. 6. This table can not be written from Hive compute engine. ### Anything else? _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
