Hi Houliang,

How about adding an option to table: external-paths?

And we can introduce an option to table: external-paths.strategy, it
is an enum, can be ROUND_ROBIN (use all paths, pick one for a file),
NONE (not use external paths).

Best,
Jingsong

On Mon, Dec 23, 2024 at 11:00 AM Houliang Qi <[email protected]> wrote:
>
> Hi Weijun,
> Sorry for the late response.
> 1. I plan to introduce a new commit type, CommitKind.REPLACE,Similar to 
> Iceberg, to support the movement of data between different storage devices. I 
> will design this part in detail later.
> 2. Leave it to the procedure to handle and create a new procedure eg. Migrate 
> Action for moving data from one storage to another.
> 3. In the scenario of cold and hot data layering, for example, in order to 
> access performance, hot data is first written to local HDFS, and then after a 
> period of time, cold data is migrated to cloud storage to reduce storage 
> costs.
> 4. Of course. If the `data-file.external-path`  is not empty, the new data 
> will be written to the `data-file.external-path`. If it is empty, the new 
> data will be written to the path specified by the warehouse. So, if your 
> warehouse is HDFS and you specify `data-file.external-path`  as OSS, then the 
> newly written data will be on OSS. If you later remove 
> `data-file.external-path, the newly written data will be written back to HDFS 
> again.
>
>
> Thanks.
> ---- Replied Message ----
> | From | Jingsong Li<[email protected]> |
> | Date | 12/23/2024 10:37 |
> | To | <[email protected]> |
> | Subject | Re: [DISCUSS] Introduce Table Multi-Location Management |
> Added to PIP-29
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-29%3A+Introduce+Table+Multi-Location++Management
>
> On Thu, Dec 19, 2024 at 9:28 AM wj wang <[email protected]> wrote:
>
> Very thanks for Houliang Qi prepare this doc and PR.
> After reading the context, I have some questions.
> 1、Could you add the design implementation details of 'Support file
> migration between different storage locations' in the doc?
> 2、When and how execute the file migration between different storage locations?
> 3、What business scenarios need to execute the file migration between
> different storage locations?
> 4、Can ‘data-file.external-path’ be removed after configuring a table?
> For example, at the beginning, the data was written in HDFS and then
> written to OSS. After a period of time, if I want to write new data
> back to HDFS, can I just remove the  ‘data-file.external-path’
> configuration?
>
> Best,
> Weijun Wang
>
> On Fri, Dec 13, 2024 at 2:56 PM Jingsong Li <[email protected]> wrote:
>
> Hi Houliang,
>
> Thanks for starting this discussion.
>
> Maybe we can just introduce an option: `data-file.external-path`? I
> don't the usage of multi.locations.
>
> In DataFileMeta, yes, we need to add another field: external_path.
>
> About FileIO, I think you can implement an own hybrid FileIO created
> by catalog options.
>
> I think the general idea is fine, but we may need a POC code to
> observe its complexity.
>
> Best,
> Jingsong
>
> On Wed, Dec 11, 2024 at 7:15 PM Houliang Qi <[email protected]> wrote:
>
> Hi Paimon devs,
>
>
> I’d like to initiate a discussion: Introduce Table Multi-Location 
> Management[1], currently, the table's data can only be persisted in catalog's 
> warehouse path, which can not be modified once it created, However, users may 
> wish to store data from a table on different storage devices, or even store 
> data from different partitions of a table on different storage devices based 
> on their level of activity. So, the topic of this proposal is how to enable 
> paimon to support multi-location management for a single table.
>
>
> Any opinions are welcome, looking forward to your feedback, thanks.
>
>
> [1] 
> https://docs.google.com/document/d/1NhmOyxM16QmY_rVb3KJtCKRrU_nogIJv532U59qW7EI/edit?tab=t.0#heading=h.xlrl29nlxwpo
>
>
>

Reply via email to