Dear All,

This is Praneeth Chityala currently pursuing master in Computer Science at 
Indiana University Bloomington. As part of my independent study I took up MFT 
as the research area and starting understanding the architecture.

As many of you know MFT uses Agent to transfer data from one cloud storage to 
other cloud storage. These agents can be deployed on any compute machines. If 
the machine in which agent is deployed might have data files which needs to be 
uploaded to cloud storage, that’s where my involvement in the project came in. 
I worked on implementing the below extensions:

  *   Implemented the Local transport extension to allow agent to transfer data 
from its host machine given storage – Local transport 
extension<https://github.com/apache/airavata-mft/tree/master/transport/local-transport/src/main/java/org/apache/airavata/mft/transport/local>
     *   Transport has three variations – streaming, chunked file transfer and 
chunked streaming
  *   Implemented the CLI for configuring local agent – Local agent 
CLI<https://github.com/apache/airavata-mft/tree/master/python-cli/mft_cli/airavata_mft_cli>

Performance testing results:

After successfully testing from my local machine to AWS S3 storage, I have 
deployed agent in AWS EC2 machine and performed multiple tests for compare it’s 
performance with rclone and AWS cli.
Below charts indicates the average transfer speeds from our analysis.

[cid:image001.png@01D95866.89F7FEE0]


For files from 100MB to 1GB, MFT is more than 60% faster than rclone and more 
than 150% faster than AWS cli.

Configurations of the testing:


  *   Local Machine: It’s Ubuntu EC2 VM on AWS (instance type c5.9xlarge) with 
18 cores, 10Gbps dedicated network speed and 1GBps read/write speed to disk.


  *   Cloud Storage: AWS S3 bucket in the same region as above VM.


  *   Test sets: From x-axis labels of the graph, 10m_1000 means a test set of 
1000 10MB files. All other test sets follow similar naming convention.


  *   Testing trails: Each test is run for 5 times on each transfer method.


  *   Testing presets: Before each test caching of VM is cleared so none of the 
tests get advantage of higher read speeds using page caching. This is done to 
simulate worst possible conditions while reading data.


  *   MFT configuration: I used chunked streaming with
     *   20MB as chunk size
     *   32 concurrent transfers
     *   32 concurrent chunked threads


  *   rclone configuration: After exploring many possible optimizations 
available for rclone I used following settings:
     *   --s3-chunk-size 128000
     *   --buffer-size 128000
     *   —s3-upload-cutoff 0
     *   --s3-upload-concurrency 32
     *   --multi-thread-streams 32
     *   --multi-thread-cutoff 0
     *   --s3-disable-http2
     *   --no-check-dest
     *   --transfers 32
     *   --fast-list


  *   AWS cli configuration: I used native AWS cli to transfer as it doesn’t 
have much dedicated optimizations in our findings

Observations:

  *   For local transport I used BufferedStreaming which helped MFT to get the 
max read speeds from local disk without hitting the max IOPS.

Future plans for testing:

  *   Jetstream2: Planning to replace AWS EC2 with Jetstream2 virtual machine 
and perform similar tests
  *   Emulab: Simulate same testing using Emulab VMs and custom configurations 
with help of Dimuthu.
  *   Azure: Perform local to Azure cloud storages testing with MFT, rclone and 
Azure cli
  *   GCP: Perform local to GCS testing with MFT, rclone and GCP cli
  *   I have different implementation of MFT local transport for system which 
support DMA (Direct Memory Access), we also plan to test on such systems with 
DMA, the present EC2 system doesn’t support DMA.

Further Improvements of MFT:

  *   As we noticed MFT is lagging speeds vs rclone for files less than or 
equal to 1MB, we plan to stress analyze the whole system and improve speeds for 
smaller files

Acknowledgement: I thank Dimuthu Wannipurage for clearing many doubts about MFT 
and providing guidance when needed.

Thank you and please let us know your comments or thoughts.

Best,
Praneeth Chityala

Reply via email to