Thanks Etienne, congrats on finishing up your PhD! Push Pull focuses on downloading multiple files using multiple threads, but I don¹t believe a single file using multiple threads.
Hope that helps! Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Etienne Koen <[email protected]> Date: Wednesday, October 15, 2014 at 1:53 AM To: Chris Mattmann <[email protected]>, "[email protected]" <[email protected]> Cc: Shakeh Khudikyan <[email protected]>, Brian Foster <[email protected]> Subject: RE: PushPull >Hi Chris, > >Sorry for the lack of communication the last couple of weeks. I had my >last PhD responsibilities which is now finally completed :-) > >I am getting back in to things again with OODT... I just want to get some >clarification on the parallel transfer of pushpull. Sorry if I am >repeating myself but I just want some clear clarification about the >parallel file transfer. Is the parallelism of pushpull implemented and >capable to download a single file using parallel threads or is it only >applicable to downloading multiple files? > >I am referring to the lines: > >org.apache.oodt.cas.pushpull.crawler.use.tracker=false > >org.apache.oodt.cas.pushpull.file.retrieval.system.recommended.thread.coun >t=30 > >Thanks > >Etienne Koen >Data Processing Systems Engineer > >Space Advisory Company > >O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: [email protected] > >________________________________________ >From: Mattmann, Chris A (3980) [[email protected]] >Sent: Monday, September 15, 2014 8:45 AM >To: Etienne Koen; [email protected] >Cc: Khudikyan, Shakeh E (398J); Brian Foster >Subject: Re: PushPull > >Dear Etienne, > >Thanks for your questions! Yes, there are ways to manipulate the >manner in which PushPull achieves parallelism, check out: > >http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/pus >h >_pull_framework.properties > > >Look at the File Retrieval System related parameters. > >Also check out this documentation produced by Brian Foster which >provides a lot of detail on how to use PushPull. > >http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/doc >u >mentation/ > > >Cheers, >Chris > > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Chief Architect >Instrument Software and Science Data Systems Section (398) >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >Office: 168-519, Mailstop: 168-527 >Email: [email protected] >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Adjunct Associate Professor, Computer Science Department >University of Southern California, Los Angeles, CA 90089 USA >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > >-----Original Message----- >From: Etienne Koen <[email protected]> >Date: Sunday, September 14, 2014 11:42 PM >To: Chris Mattmann <[email protected]>, "[email protected]" ><[email protected]> >Cc: Shakeh Khudikyan <[email protected]> >Subject: RE: PushPull > >>Thanks for the information! >> >>Please correct me if I am wrong, PushPull in it's default operation >>downloads files in parallel? Is there a way to specify any of the >>parallel parameters when downloading files? For example, thread number? >>Is there any way to have more control over the parallelism? >> >>Thanks >>Etienne >> >>Etienne Koen >>Data Processing Systems Engineer >> >>Space Advisory Company >> >>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: [email protected] >> >>________________________________________ >>From: Mattmann, Chris A (3980) [[email protected]] >>Sent: Friday, September 12, 2014 4:18 PM >>To: Etienne Koen; [email protected] >>Cc: Khudikyan, Shakeh E (398J) >>Subject: Re: PushPull >> >>Hi Etienne, >> >>Thanks for your question! Yes, PushPull has parallel downloading >>capability, so in terms of "pulling" data it definitely has similar >>capability to GridFTP. PushPull can't initiate or "push" a transfer >>like GridFTP can in that sense, so it's not exactly an apples to >>apples comparison. >> >>For the wiki, you can sign up to create an account here: >> >>https://cwiki.apache.org/confluence/signup.action >> >>Cheers! >> >>Chris >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>Chris Mattmann, Ph.D. >>Chief Architect >>Instrument Software and Science Data Systems Section (398) >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>Office: 168-519, Mailstop: 168-527 >>Email: [email protected] >>WWW: http://sunset.usc.edu/~mattmann/ >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>Adjunct Associate Professor, Computer Science Department >>University of Southern California, Los Angeles, CA 90089 USA >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >>-----Original Message----- >>From: Etienne Koen <[email protected]> >>Date: Friday, September 12, 2014 12:15 AM >>To: Chris Mattmann <[email protected]>, "[email protected]" >><[email protected]> >>Cc: Shakeh Khudikyan <[email protected]> >>Subject: RE: PushPull >> >>>Hi Chris, >>> >>>Thank you for your response and info! I would be happy to document my >>>results and would appreciate it if the community could respond to some >>>of >>>my questions I still have. >>> >>>At the moment it does not look like I have permissions or the >>>functionality to create a page... Or I am looking at the wrong place to >>>do so :-) >>> >>>My immediate question is whether pushpull have the parallel capability >>>such as GridFTP and how to specify it for the next test phase... >>> >>>Cheers >>> >>>Etienne Koen >>>Data Processing Systems Engineer >>> >>>Space Advisory Company >>> >>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: [email protected] >>> >>>________________________________________ >>>From: Mattmann, Chris A (3980) [[email protected]] >>>Sent: Thursday, September 11, 2014 4:47 PM >>>To: [email protected] >>>Cc: Etienne Koen; Khudikyan, Shakeh E (398J) >>>Subject: FW: PushPull >>> >>>Etienne, >>> >>> >>>Thank you for sending this along! The crazy part about these types of >>>data >>>transfer studies especially with TCP/IP based protocols that aren't >>>parallelized >>>(e.g., FTP) is that you are limited by what's going on in the >>>surrounding >>>network. >>>For example see the attached studies my team has published on data >>>movement >>>over the past 5-7 years and notice a similar type of behavior. Pretty >>>interesting >>>independent of the family of data transfer you're using. >>> >>>Take a look at my Dissertation too: >>> >>>http://sunset.usc.edu/~mattmann/Dissertation.pdf >>> >>>This concluded that parallel TCP/IP technologies like GridFTP (now >>>GlobusOnline) >>>and bbFTP performed the best across the public WAN for performance and >>>efficiency >>>related parameters, whereas if those aren't the overall properties you >>>are >>>trying >>>to maximize (and instead care about good enough performance, but with >>>ease >>>of >>>install and use - then things like WebDAV and so forth are probably good >>>enough). >>> >>>I'd be happy to discuss your results more in general. It would be great >>>if >>>you >>>created a wiki page here: >>> >>>https://cwiki.apache.org/confluence/display/OODT/Home >>> >>> >>>To document your testing and results. Thank you and let me know! >>> >>>Cheers, >>>Chris >>> >>>-----Original Message----- >>>From: Etienne Koen <[email protected]> >>>Date: Thursday, September 11, 2014 12:55 AM >>>To: Chris Mattmann <[email protected]> >>>Cc: Shakeh Khudikyan <[email protected]> >>>Subject: PushPull >>> >>>>Hi Chris and Shakeh, >>>> >>>>Attached are some of the results which were performed according to the >>>>baseline testing requirements. This was simply to transfer a directory >>>>of >>>>1GB with varying file sizes. For completeness I have gone so far as to >>>>transfer files of 1MB each (This scenario might not be very probable >>>>for >>>>SKA though...). I have noticed a substantiation drop in the transfer >>>>rate >>>>achieved compared to the 100MB files as well as the transfer rate being >>>>quite variable. What would be the main contributor for this? I see that >>>>there is a metadata file created for each transfer which might perhaps >>>>contribute to the overhead and become quite prominent in the 1000 x 1MB >>>>file case. All these tests used the FTP protocol and were performed on >>>>the same machine and network link: >>>> >>>> >>>> >>>> >>>> >>>>For testing single file transfer I found the maximum transfer rate only >>>>being achieved for files > 256 MB: >>>> >>>> >>>> >>>> >>>>I also monitored the transfer rate of a 8192 MB file which constantly >>>>revealed an interesting behaviour of achieving a maximum transfer rate >>>>after which the transfer rate then drops. I am also unsure what the >>>>cause >>>>for this might be as it happened constantly and in both transfer >>>>directions: >>>> >>>> >>>> >>>>I would greatly appreciate your comments on this and it include it in >>>>my >>>>report before I submit it during next week. >>>> >>>>All the best! >>>> >>>>Cheers >>>>Etienne >>>> >>>> >>>> >>>> >>>>Etienne Koen >>>>Data Processing Systems Engineer >>>> >>>> >>>> >>>> >>>>Space Advisory Company >>>> >>>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: [email protected] >>>> >>>> >>>> >>>> >>>> >>> >>> >>>________________________________ >>> >>>Disclaimer: This E-mail message, including any attachments, is intended >>>only for the person or entity to which it is addressed, and may contain >>>confidential information. Each page attached hereto must also be read in >>>conjunction with this disclaimer. >>>If you are not the intended recipient you are hereby notified that any >>>disclosure, copying, distribution or reliance upon the contents of this >>>e-mail is strictly prohibited. E.&O.E. >>> >>>________________________________ >>> >>>Disclaimer: This E-mail message, including any attachments, is intended >>>only for the person or entity to which it is addressed, and may contain >>>confidential information. Each page attached hereto must also be read in >>>conjunction with this disclaimer. >>>If you are not the intended recipient you are hereby notified that any >>>disclosure, copying, distribution or reliance upon the contents of this >>>e-mail is strictly prohibited. E.&O.E. >> >> >>________________________________ >> >>Disclaimer: This E-mail message, including any attachments, is intended >>only for the person or entity to which it is addressed, and may contain >>confidential information. Each page attached hereto must also be read in >>conjunction with this disclaimer. >>If you are not the intended recipient you are hereby notified that any >>disclosure, copying, distribution or reliance upon the contents of this >>e-mail is strictly prohibited. E.&O.E. >> >>________________________________ >> >>Disclaimer: This E-mail message, including any attachments, is intended >>only for the person or entity to which it is addressed, and may contain >>confidential information. Each page attached hereto must also be read in >>conjunction with this disclaimer. >>If you are not the intended recipient you are hereby notified that any >>disclosure, copying, distribution or reliance upon the contents of this >>e-mail is strictly prohibited. E.&O.E. > > >________________________________ > >Disclaimer: This E-mail message, including any attachments, is intended >only for the person or entity to which it is addressed, and may contain >confidential information. Each page attached hereto must also be read in >conjunction with this disclaimer. >If you are not the intended recipient you are hereby notified that any >disclosure, copying, distribution or reliance upon the contents of this >e-mail is strictly prohibited. E.&O.E. > >________________________________ > >Disclaimer: This E-mail message, including any attachments, is intended >only for the person or entity to which it is addressed, and may contain >confidential information. Each page attached hereto must also be read in >conjunction with this disclaimer. >If you are not the intended recipient you are hereby notified that any >disclosure, copying, distribution or reliance upon the contents of this >e-mail is strictly prohibited. E.&O.E.
