Hi Paimon devs,
According to the previous discussion, I have written a proof of concept code[1]. The main functions are as follows: 1. Provide `data-file.external-path` to indicate the location of the newly written data. If this item is empty, the data is still written to the path specified by the warehouse as before. 2. Add the `dataRootLocation` attribute in DataFileMeta to indicate the location of the data. If `data-file.external-path` is not empty, this value is `data-file.external-path`, otherwise it is the warehouse path. 3. Provide TablePathProvider, which can build the storage path of the table according to `data-file.external-path` or warehouse path. 4. Provide HybridFileIO, which can create the corresponding FileIO according to the scheme. The above is only part of the whole work. But from these codes, we can also see the complexity of this work and the changes in the code. Welcome to criticize and correct. [1] https://github.com/apache/paimon/pull/4720/files Best, Houliang ---- Replied Message ---- | From | Houliang Qi<neuyi...@163.com> | | Date | 12/13/2024 18:52 | | To | dev@paimon.apache.org<dev@paimon.apache.org> | | Subject | Re: [DISCUSS] Introduce Table Multi-Location Management | Hi Jingsong, Thank for your reply. Initially, the design involved adding `multi.locations` and `default.write.location` as table properties. However, upon further consideration, it seems more efficient to move these filesystem options to the catalog itself and modify the catalog to support multiple warehouses. This way, the table properties would only need to include a `data-file.external-path`, If the attribute `data-file.external-path` is not empty, the default written data will be written to the storage specified by `data-file.external-path`. Additionally, when migrating hot and cold data later, users could select the destination address for the migration based on the filesystem options provided in the catalog. I will implement a POC code based on this new design and share it with the team for feedback. Best, Houliang ---- Replied Message ---- | From | Jingsong Li<jingsongl...@gmail.com> | | Date | 12/13/2024 14:56 | | To | <dev@paimon.apache.org> | | Subject | Re: [DISCUSS] Introduce Table Multi-Location Management | Hi Houliang, Thanks for starting this discussion. Maybe we can just introduce an option: `data-file.external-path`? I don't the usage of multi.locations. In DataFileMeta, yes, we need to add another field: external_path. About FileIO, I think you can implement an own hybrid FileIO created by catalog options. I think the general idea is fine, but we may need a POC code to observe its complexity. Best, Jingsong On Wed, Dec 11, 2024 at 7:15 PM Houliang Qi <neuyi...@163.com> wrote: Hi Paimon devs, I’d like to initiate a discussion: Introduce Table Multi-Location Management[1], currently, the table's data can only be persisted in catalog's warehouse path, which can not be modified once it created, However, users may wish to store data from a table on different storage devices, or even store data from different partitions of a table on different storage devices based on their level of activity. So, the topic of this proposal is how to enable paimon to support multi-location management for a single table. Any opinions are welcome, looking forward to your feedback, thanks. [1] https://docs.google.com/document/d/1NhmOyxM16QmY_rVb3KJtCKRrU_nogIJv532U59qW7EI/edit?tab=t.0#heading=h.xlrl29nlxwpo