[ 
https://issues.apache.org/jira/browse/IGNITE-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-9275:
-------------------------------------
    Description: 
As a first step to estimate how much faster the file-rebalancing may be, I 
suggest to implement a simple partition fetch procedure via the communication 
SPI extension: 
1) Node A sends a partition fetch request to node B 
2) Node B starts a checkpoint and creates a local copy of the partition. Note 
that during the partition copy there might be concurrent ongoing checkpoints, 
this must be handled properly
3) Node B establishes a new TCP connection on the TCP communication port 
(handshake and verification is assumed)
4) Node B calls transferFile (or native analogue, investigation needed) to send 
the partition file in the most effective way
5) Node A writes the file to a specified location on the local file system

After this mechanics is implemented, we need to hack the rebalance code and use 
partition fetch logic instead of regular rebalance to measure
1) How much faster (or slower) the new approach performs
2) How it affects the concurrent transactions in the grid

> Introduce mechanism to fetch partition file via a p2p protocol
> --------------------------------------------------------------
>
>                 Key: IGNITE-9275
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9275
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Alexey Goncharuk
>            Priority: Major
>
> As a first step to estimate how much faster the file-rebalancing may be, I 
> suggest to implement a simple partition fetch procedure via the communication 
> SPI extension: 
> 1) Node A sends a partition fetch request to node B 
> 2) Node B starts a checkpoint and creates a local copy of the partition. Note 
> that during the partition copy there might be concurrent ongoing checkpoints, 
> this must be handled properly
> 3) Node B establishes a new TCP connection on the TCP communication port 
> (handshake and verification is assumed)
> 4) Node B calls transferFile (or native analogue, investigation needed) to 
> send the partition file in the most effective way
> 5) Node A writes the file to a specified location on the local file system
> After this mechanics is implemented, we need to hack the rebalance code and 
> use partition fetch logic instead of regular rebalance to measure
> 1) How much faster (or slower) the new approach performs
> 2) How it affects the concurrent transactions in the grid



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to