Hi Marcus,

Thank you. The Local plugin to download data has been implemented, but the 
testing to check speeds is something not done yet. I will also add this to my 
future performance testing.

Best,
Praneeth

On 3/20/23, 2:43 PM, "Christie, Marcus Aaron" <machr...@iu.edu 
<mailto:machr...@iu.edu>> wrote:


HI Praneeth,


This looks like a great contribution to MFT and I appreciate your write up here.


One question: these numbers are for uploading from the local EC2 instance to 
S3, correct? Did you do any analysis on the opposite, downloading from S3 to a 
local EC2 instance?


Thanks,


Marcus


> On Mar 17, 2023, at 12:22 AM, Chityala, Praneeth <pkchi...@iu.edu 
> <mailto:pkchi...@iu.edu>> wrote:
> 
> You don't often get email from pkchi...@iu.edu <mailto:pkchi...@iu.edu>. 
> Learn why this is important
> Dear All,
> 
> This is Praneeth Chityala currently pursuing master in Computer Science at 
> Indiana University Bloomington. As part of my independent study I took up MFT 
> as the research area and starting understanding the architecture.
> 
> As many of you know MFT uses Agent to transfer data from one cloud storage to 
> other cloud storage. These agents can be deployed on any compute machines. If 
> the machine in which agent is deployed might have data files which needs to 
> be uploaded to cloud storage, that’s where my involvement in the project came 
> in. I worked on implementing the below extensions:
> • Implemented the Local transport extension to allow agent to transfer data 
> from its host machine given storage – Local transport extension
> • Transport has three variations – streaming, chunked file transfer and 
> chunked streaming
> • Implemented the CLI for configuring local agent – Local agent CLI
> 
> Performance testing results:
> 
> After successfully testing from my local machine to AWS S3 storage, I have 
> deployed agent in AWS EC2 machine and performed multiple tests for compare 
> it’s performance with rclone and AWS cli.
> Below charts indicates the average transfer speeds from our analysis.
> 
> <image001.png>
> 
> 
> For files from 100MB to 1GB, MFT is more than 60% faster than rclone and more 
> than 150% faster than AWS cli.
> 
> Configurations of the testing:
> 
> • Local Machine: It’s Ubuntu EC2 VM on AWS (instance type c5.9xlarge) with 18 
> cores, 10Gbps dedicated network speed and 1GBps read/write speed to disk.
> 
> • Cloud Storage: AWS S3 bucket in the same region as above VM.
> 
> • Test sets: From x-axis labels of the graph, 10m_1000 means a test set of 
> 1000 10MB files. All other test sets follow similar naming convention.
> 
> • Testing trails: Each test is run for 5 times on each transfer method.
> 
> • Testing presets: Before each test caching of VM is cleared so none of the 
> tests get advantage of higher read speeds using page caching. This is done to 
> simulate worst possible conditions while reading data.
> 
> • MFT configuration: I used chunked streaming with
> • 20MB as chunk size
> • 32 concurrent transfers
> • 32 concurrent chunked threads
> 
> • rclone configuration: After exploring many possible optimizations available 
> for rclone I used following settings:
> • --s3-chunk-size 128000
> • --buffer-size 128000
> • —s3-upload-cutoff 0
> • --s3-upload-concurrency 32
> • --multi-thread-streams 32
> • --multi-thread-cutoff 0
> • --s3-disable-http2
> • --no-check-dest
> • --transfers 32
> • --fast-list
> 
> • AWS cli configuration: I used native AWS cli to transfer as it doesn’t have 
> much dedicated optimizations in our findings
> 
> Observations:
> • For local transport I used BufferedStreaming which helped MFT to get the 
> max read speeds from local disk without hitting the max IOPS.
> 
> Future plans for testing:
> • Jetstream2: Planning to replace AWS EC2 with Jetstream2 virtual machine and 
> perform similar tests
> • Emulab: Simulate same testing using Emulab VMs and custom configurations 
> with help of Dimuthu.
> • Azure: Perform local to Azure cloud storages testing with MFT, rclone and 
> Azure cli
> • GCP: Perform local to GCS testing with MFT, rclone and GCP cli
> • I have different implementation of MFT local transport for system which 
> support DMA (Direct Memory Access), we also plan to test on such systems with 
> DMA, the present EC2 system doesn’t support DMA.
> 
> Further Improvements of MFT:
> • As we noticed MFT is lagging speeds vs rclone for files less than or equal 
> to 1MB, we plan to stress analyze the whole system and improve speeds for 
> smaller files
> 
> Acknowledgement: I thank Dimuthu Wannipurage for clearing many doubts about 
> MFT and providing guidance when needed.
> 
> Thank you and please let us know your comments or thoughts. 
> 
> Best,
> Praneeth Chityala





Reply via email to