Hi Weijun, Sorry for the late response. 1. I plan to introduce a new commit type, CommitKind.REPLACE,Similar to Iceberg, to support the movement of data between different storage devices. I will design this part in detail later. 2. Leave it to the procedure to handle and create a new procedure eg. Migrate Action for moving data from one storage to another. 3. In the scenario of cold and hot data layering, for example, in order to access performance, hot data is first written to local HDFS, and then after a period of time, cold data is migrated to cloud storage to reduce storage costs. 4. Of course. If the `data-file.external-path` is not empty, the new data will be written to the `data-file.external-path`. If it is empty, the new data will be written to the path specified by the warehouse. So, if your warehouse is HDFS and you specify `data-file.external-path` as OSS, then the newly written data will be on OSS. If you later remove `data-file.external-path, the newly written data will be written back to HDFS again.
Thanks. ---- Replied Message ---- | From | Jingsong Li<jingsongl...@gmail.com> | | Date | 12/23/2024 10:37 | | To | <dev@paimon.apache.org> | | Subject | Re: [DISCUSS] Introduce Table Multi-Location Management | Added to PIP-29 https://cwiki.apache.org/confluence/display/PAIMON/PIP-29%3A+Introduce+Table+Multi-Location++Management On Thu, Dec 19, 2024 at 9:28 AM wj wang <hongli....@gmail.com> wrote: Very thanks for Houliang Qi prepare this doc and PR. After reading the context, I have some questions. 1、Could you add the design implementation details of 'Support file migration between different storage locations' in the doc? 2、When and how execute the file migration between different storage locations? 3、What business scenarios need to execute the file migration between different storage locations? 4、Can ‘data-file.external-path’ be removed after configuring a table? For example, at the beginning, the data was written in HDFS and then written to OSS. After a period of time, if I want to write new data back to HDFS, can I just remove the ‘data-file.external-path’ configuration? Best, Weijun Wang On Fri, Dec 13, 2024 at 2:56 PM Jingsong Li <jingsongl...@gmail.com> wrote: Hi Houliang, Thanks for starting this discussion. Maybe we can just introduce an option: `data-file.external-path`? I don't the usage of multi.locations. In DataFileMeta, yes, we need to add another field: external_path. About FileIO, I think you can implement an own hybrid FileIO created by catalog options. I think the general idea is fine, but we may need a POC code to observe its complexity. Best, Jingsong On Wed, Dec 11, 2024 at 7:15 PM Houliang Qi <neuyi...@163.com> wrote: Hi Paimon devs, I’d like to initiate a discussion: Introduce Table Multi-Location Management[1], currently, the table's data can only be persisted in catalog's warehouse path, which can not be modified once it created, However, users may wish to store data from a table on different storage devices, or even store data from different partitions of a table on different storage devices based on their level of activity. So, the topic of this proposal is how to enable paimon to support multi-location management for a single table. Any opinions are welcome, looking forward to your feedback, thanks. [1] https://docs.google.com/document/d/1NhmOyxM16QmY_rVb3KJtCKRrU_nogIJv532U59qW7EI/edit?tab=t.0#heading=h.xlrl29nlxwpo