Hi Marcus, Thank you. The Local plugin to download data has been implemented, but the testing to check speeds is something not done yet. I will also add this to my future performance testing.
Best, Praneeth On 3/20/23, 2:43 PM, "Christie, Marcus Aaron" <machr...@iu.edu <mailto:machr...@iu.edu>> wrote: HI Praneeth, This looks like a great contribution to MFT and I appreciate your write up here. One question: these numbers are for uploading from the local EC2 instance to S3, correct? Did you do any analysis on the opposite, downloading from S3 to a local EC2 instance? Thanks, Marcus > On Mar 17, 2023, at 12:22 AM, Chityala, Praneeth <pkchi...@iu.edu > <mailto:pkchi...@iu.edu>> wrote: > > You don't often get email from pkchi...@iu.edu <mailto:pkchi...@iu.edu>. > Learn why this is important > Dear All, > > This is Praneeth Chityala currently pursuing master in Computer Science at > Indiana University Bloomington. As part of my independent study I took up MFT > as the research area and starting understanding the architecture. > > As many of you know MFT uses Agent to transfer data from one cloud storage to > other cloud storage. These agents can be deployed on any compute machines. If > the machine in which agent is deployed might have data files which needs to > be uploaded to cloud storage, that’s where my involvement in the project came > in. I worked on implementing the below extensions: > • Implemented the Local transport extension to allow agent to transfer data > from its host machine given storage – Local transport extension > • Transport has three variations – streaming, chunked file transfer and > chunked streaming > • Implemented the CLI for configuring local agent – Local agent CLI > > Performance testing results: > > After successfully testing from my local machine to AWS S3 storage, I have > deployed agent in AWS EC2 machine and performed multiple tests for compare > it’s performance with rclone and AWS cli. > Below charts indicates the average transfer speeds from our analysis. > > <image001.png> > > > For files from 100MB to 1GB, MFT is more than 60% faster than rclone and more > than 150% faster than AWS cli. > > Configurations of the testing: > > • Local Machine: It’s Ubuntu EC2 VM on AWS (instance type c5.9xlarge) with 18 > cores, 10Gbps dedicated network speed and 1GBps read/write speed to disk. > > • Cloud Storage: AWS S3 bucket in the same region as above VM. > > • Test sets: From x-axis labels of the graph, 10m_1000 means a test set of > 1000 10MB files. All other test sets follow similar naming convention. > > • Testing trails: Each test is run for 5 times on each transfer method. > > • Testing presets: Before each test caching of VM is cleared so none of the > tests get advantage of higher read speeds using page caching. This is done to > simulate worst possible conditions while reading data. > > • MFT configuration: I used chunked streaming with > • 20MB as chunk size > • 32 concurrent transfers > • 32 concurrent chunked threads > > • rclone configuration: After exploring many possible optimizations available > for rclone I used following settings: > • --s3-chunk-size 128000 > • --buffer-size 128000 > • —s3-upload-cutoff 0 > • --s3-upload-concurrency 32 > • --multi-thread-streams 32 > • --multi-thread-cutoff 0 > • --s3-disable-http2 > • --no-check-dest > • --transfers 32 > • --fast-list > > • AWS cli configuration: I used native AWS cli to transfer as it doesn’t have > much dedicated optimizations in our findings > > Observations: > • For local transport I used BufferedStreaming which helped MFT to get the > max read speeds from local disk without hitting the max IOPS. > > Future plans for testing: > • Jetstream2: Planning to replace AWS EC2 with Jetstream2 virtual machine and > perform similar tests > • Emulab: Simulate same testing using Emulab VMs and custom configurations > with help of Dimuthu. > • Azure: Perform local to Azure cloud storages testing with MFT, rclone and > Azure cli > • GCP: Perform local to GCS testing with MFT, rclone and GCP cli > • I have different implementation of MFT local transport for system which > support DMA (Direct Memory Access), we also plan to test on such systems with > DMA, the present EC2 system doesn’t support DMA. > > Further Improvements of MFT: > • As we noticed MFT is lagging speeds vs rclone for files less than or equal > to 1MB, we plan to stress analyze the whole system and improve speeds for > smaller files > > Acknowledgement: I thank Dimuthu Wannipurage for clearing many doubts about > MFT and providing guidance when needed. > > Thank you and please let us know your comments or thoughts. > > Best, > Praneeth Chityala