Hi Houliang, How about adding an option to table: external-paths?
And we can introduce an option to table: external-paths.strategy, it is an enum, can be ROUND_ROBIN (use all paths, pick one for a file), NONE (not use external paths). Best, Jingsong On Mon, Dec 23, 2024 at 11:00 AM Houliang Qi <[email protected]> wrote: > > Hi Weijun, > Sorry for the late response. > 1. I plan to introduce a new commit type, CommitKind.REPLACE,Similar to > Iceberg, to support the movement of data between different storage devices. I > will design this part in detail later. > 2. Leave it to the procedure to handle and create a new procedure eg. Migrate > Action for moving data from one storage to another. > 3. In the scenario of cold and hot data layering, for example, in order to > access performance, hot data is first written to local HDFS, and then after a > period of time, cold data is migrated to cloud storage to reduce storage > costs. > 4. Of course. If the `data-file.external-path` is not empty, the new data > will be written to the `data-file.external-path`. If it is empty, the new > data will be written to the path specified by the warehouse. So, if your > warehouse is HDFS and you specify `data-file.external-path` as OSS, then the > newly written data will be on OSS. If you later remove > `data-file.external-path, the newly written data will be written back to HDFS > again. > > > Thanks. > ---- Replied Message ---- > | From | Jingsong Li<[email protected]> | > | Date | 12/23/2024 10:37 | > | To | <[email protected]> | > | Subject | Re: [DISCUSS] Introduce Table Multi-Location Management | > Added to PIP-29 > https://cwiki.apache.org/confluence/display/PAIMON/PIP-29%3A+Introduce+Table+Multi-Location++Management > > On Thu, Dec 19, 2024 at 9:28 AM wj wang <[email protected]> wrote: > > Very thanks for Houliang Qi prepare this doc and PR. > After reading the context, I have some questions. > 1、Could you add the design implementation details of 'Support file > migration between different storage locations' in the doc? > 2、When and how execute the file migration between different storage locations? > 3、What business scenarios need to execute the file migration between > different storage locations? > 4、Can ‘data-file.external-path’ be removed after configuring a table? > For example, at the beginning, the data was written in HDFS and then > written to OSS. After a period of time, if I want to write new data > back to HDFS, can I just remove the ‘data-file.external-path’ > configuration? > > Best, > Weijun Wang > > On Fri, Dec 13, 2024 at 2:56 PM Jingsong Li <[email protected]> wrote: > > Hi Houliang, > > Thanks for starting this discussion. > > Maybe we can just introduce an option: `data-file.external-path`? I > don't the usage of multi.locations. > > In DataFileMeta, yes, we need to add another field: external_path. > > About FileIO, I think you can implement an own hybrid FileIO created > by catalog options. > > I think the general idea is fine, but we may need a POC code to > observe its complexity. > > Best, > Jingsong > > On Wed, Dec 11, 2024 at 7:15 PM Houliang Qi <[email protected]> wrote: > > Hi Paimon devs, > > > I’d like to initiate a discussion: Introduce Table Multi-Location > Management[1], currently, the table's data can only be persisted in catalog's > warehouse path, which can not be modified once it created, However, users may > wish to store data from a table on different storage devices, or even store > data from different partitions of a table on different storage devices based > on their level of activity. So, the topic of this proposal is how to enable > paimon to support multi-location management for a single table. > > > Any opinions are welcome, looking forward to your feedback, thanks. > > > [1] > https://docs.google.com/document/d/1NhmOyxM16QmY_rVb3KJtCKRrU_nogIJv532U59qW7EI/edit?tab=t.0#heading=h.xlrl29nlxwpo > > >
