Hi Weijun,
Sorry for the late response.
1. I plan to introduce a new commit type, CommitKind.REPLACE,Similar to 
Iceberg, to support the movement of data between different storage devices. I 
will design this part in detail later.
2. Leave it to the procedure to handle and create a new procedure eg. Migrate 
Action for moving data from one storage to another.
3. In the scenario of cold and hot data layering, for example, in order to 
access performance, hot data is first written to local HDFS, and then after a 
period of time, cold data is migrated to cloud storage to reduce storage costs.
4. Of course. If the `data-file.external-path`  is not empty, the new data will 
be written to the `data-file.external-path`. If it is empty, the new data will 
be written to the path specified by the warehouse. So, if your warehouse is 
HDFS and you specify `data-file.external-path`  as OSS, then the newly written 
data will be on OSS. If you later remove `data-file.external-path, the newly 
written data will be written back to HDFS again.


Thanks.
---- Replied Message ----
| From | Jingsong Li<jingsongl...@gmail.com> |
| Date | 12/23/2024 10:37 |
| To | <dev@paimon.apache.org> |
| Subject | Re: [DISCUSS] Introduce Table Multi-Location Management |
Added to PIP-29
https://cwiki.apache.org/confluence/display/PAIMON/PIP-29%3A+Introduce+Table+Multi-Location++Management

On Thu, Dec 19, 2024 at 9:28 AM wj wang <hongli....@gmail.com> wrote:

Very thanks for Houliang Qi prepare this doc and PR.
After reading the context, I have some questions.
1、Could you add the design implementation details of 'Support file
migration between different storage locations' in the doc?
2、When and how execute the file migration between different storage locations?
3、What business scenarios need to execute the file migration between
different storage locations?
4、Can ‘data-file.external-path’ be removed after configuring a table?
For example, at the beginning, the data was written in HDFS and then
written to OSS. After a period of time, if I want to write new data
back to HDFS, can I just remove the  ‘data-file.external-path’
configuration?

Best,
Weijun Wang

On Fri, Dec 13, 2024 at 2:56 PM Jingsong Li <jingsongl...@gmail.com> wrote:

Hi Houliang,

Thanks for starting this discussion.

Maybe we can just introduce an option: `data-file.external-path`? I
don't the usage of multi.locations.

In DataFileMeta, yes, we need to add another field: external_path.

About FileIO, I think you can implement an own hybrid FileIO created
by catalog options.

I think the general idea is fine, but we may need a POC code to
observe its complexity.

Best,
Jingsong

On Wed, Dec 11, 2024 at 7:15 PM Houliang Qi <neuyi...@163.com> wrote:

Hi Paimon devs,


I’d like to initiate a discussion: Introduce Table Multi-Location 
Management[1], currently, the table's data can only be persisted in catalog's 
warehouse path, which can not be modified once it created, However, users may 
wish to store data from a table on different storage devices, or even store 
data from different partitions of a table on different storage devices based on 
their level of activity. So, the topic of this proposal is how to enable paimon 
to support multi-location management for a single table.


Any opinions are welcome, looking forward to your feedback, thanks.


[1] 
https://docs.google.com/document/d/1NhmOyxM16QmY_rVb3KJtCKRrU_nogIJv532U59qW7EI/edit?tab=t.0#heading=h.xlrl29nlxwpo



Reply via email to