Thanks Etienne, congrats on finishing up your PhD!

Push Pull focuses on downloading multiple files using multiple threads,
but I don¹t believe a single file using multiple threads.

Hope that helps!

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Etienne Koen <[email protected]>
Date: Wednesday, October 15, 2014 at 1:53 AM
To: Chris Mattmann <[email protected]>, "[email protected]"
<[email protected]>
Cc: Shakeh Khudikyan <[email protected]>, Brian Foster
<[email protected]>
Subject: RE: PushPull

>Hi Chris,
>
>Sorry for the lack of communication the last couple of weeks. I had my
>last PhD responsibilities which is now finally completed :-)
>
>I am getting back in to things again with OODT... I just want to get some
>clarification on the parallel transfer of pushpull. Sorry if I am
>repeating myself but I just want some clear clarification about the
>parallel file transfer. Is the parallelism of pushpull implemented and
>capable to download a single file using parallel threads or is it only
>applicable to downloading multiple files?
>
>I am referring to the lines:
>
>org.apache.oodt.cas.pushpull.crawler.use.tracker=false
>
>org.apache.oodt.cas.pushpull.file.retrieval.system.recommended.thread.coun
>t=30
>
>Thanks
>
>Etienne Koen
>Data Processing Systems Engineer
>
>Space Advisory Company
>
>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: [email protected]
>
>________________________________________
>From: Mattmann, Chris A (3980) [[email protected]]
>Sent: Monday, September 15, 2014 8:45 AM
>To: Etienne Koen; [email protected]
>Cc: Khudikyan, Shakeh E (398J); Brian Foster
>Subject: Re: PushPull
>
>Dear Etienne,
>
>Thanks for your questions! Yes, there are ways to manipulate the
>manner in which PushPull achieves parallelism, check out:
>
>http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/pus
>h
>_pull_framework.properties
>
>
>Look at the File Retrieval System related parameters.
>
>Also check out this documentation produced by Brian Foster which
>provides a lot of detail on how to use PushPull.
>
>http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/doc
>u
>mentation/
>
>
>Cheers,
>Chris
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: [email protected]
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Etienne Koen <[email protected]>
>Date: Sunday, September 14, 2014 11:42 PM
>To: Chris Mattmann <[email protected]>, "[email protected]"
><[email protected]>
>Cc: Shakeh Khudikyan <[email protected]>
>Subject: RE: PushPull
>
>>Thanks for the information!
>>
>>Please correct me if I am wrong, PushPull in it's default operation
>>downloads files in parallel? Is there a way to specify any of the
>>parallel parameters when downloading files? For example, thread number?
>>Is there any way to have more control over the parallelism?
>>
>>Thanks
>>Etienne
>>
>>Etienne Koen
>>Data Processing Systems Engineer
>>
>>Space Advisory Company
>>
>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: [email protected]
>>
>>________________________________________
>>From: Mattmann, Chris A (3980) [[email protected]]
>>Sent: Friday, September 12, 2014 4:18 PM
>>To: Etienne Koen; [email protected]
>>Cc: Khudikyan, Shakeh E (398J)
>>Subject: Re: PushPull
>>
>>Hi Etienne,
>>
>>Thanks for your question! Yes, PushPull has parallel downloading
>>capability, so in terms of "pulling" data it definitely has similar
>>capability to GridFTP. PushPull can't initiate or "push" a transfer
>>like GridFTP can in that sense, so it's not exactly an apples to
>>apples comparison.
>>
>>For the wiki, you can sign up to create an account here:
>>
>>https://cwiki.apache.org/confluence/signup.action
>>
>>Cheers!
>>
>>Chris
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Chief Architect
>>Instrument Software and Science Data Systems Section (398)
>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>Office: 168-519, Mailstop: 168-527
>>Email: [email protected]
>>WWW:  http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Adjunct Associate Professor, Computer Science Department
>>University of Southern California, Los Angeles, CA 90089 USA
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>>-----Original Message-----
>>From: Etienne Koen <[email protected]>
>>Date: Friday, September 12, 2014 12:15 AM
>>To: Chris Mattmann <[email protected]>, "[email protected]"
>><[email protected]>
>>Cc: Shakeh Khudikyan <[email protected]>
>>Subject: RE: PushPull
>>
>>>Hi Chris,
>>>
>>>Thank you for your response and info! I would be happy to document my
>>>results and would appreciate it if the community could respond to some
>>>of
>>>my questions I still have.
>>>
>>>At the moment it does not look like I have permissions or the
>>>functionality to create a page... Or I am looking at the wrong place to
>>>do so :-)
>>>
>>>My immediate question is whether pushpull have the parallel capability
>>>such as GridFTP and how to specify it for the next test phase...
>>>
>>>Cheers
>>>
>>>Etienne Koen
>>>Data Processing Systems Engineer
>>>
>>>Space Advisory Company
>>>
>>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: [email protected]
>>>
>>>________________________________________
>>>From: Mattmann, Chris A (3980) [[email protected]]
>>>Sent: Thursday, September 11, 2014 4:47 PM
>>>To: [email protected]
>>>Cc: Etienne Koen; Khudikyan, Shakeh E (398J)
>>>Subject: FW: PushPull
>>>
>>>Etienne,
>>>
>>>
>>>Thank you for sending this along! The crazy part about these types of
>>>data
>>>transfer studies especially with TCP/IP based protocols that aren't
>>>parallelized
>>>(e.g., FTP) is that you are limited by what's going on in the
>>>surrounding
>>>network.
>>>For example see the attached studies my team has published on data
>>>movement
>>>over the past 5-7 years and notice a similar type of behavior. Pretty
>>>interesting
>>>independent of the family of data transfer you're using.
>>>
>>>Take a look at my Dissertation too:
>>>
>>>http://sunset.usc.edu/~mattmann/Dissertation.pdf
>>>
>>>This concluded that parallel TCP/IP technologies like GridFTP (now
>>>GlobusOnline)
>>>and bbFTP performed the best across the public WAN for performance and
>>>efficiency
>>>related parameters, whereas if those aren't the overall properties you
>>>are
>>>trying
>>>to maximize (and instead care about good enough performance, but with
>>>ease
>>>of
>>>install and use - then things like WebDAV and so forth are probably good
>>>enough).
>>>
>>>I'd be happy to discuss your results more in general. It would be great
>>>if
>>>you
>>>created a wiki page here:
>>>
>>>https://cwiki.apache.org/confluence/display/OODT/Home
>>>
>>>
>>>To document your testing and results. Thank you and let me know!
>>>
>>>Cheers,
>>>Chris
>>>
>>>-----Original Message-----
>>>From: Etienne Koen <[email protected]>
>>>Date: Thursday, September 11, 2014 12:55 AM
>>>To: Chris Mattmann <[email protected]>
>>>Cc: Shakeh Khudikyan <[email protected]>
>>>Subject: PushPull
>>>
>>>>Hi Chris and Shakeh,
>>>>
>>>>Attached are some of the results which were performed according to the
>>>>baseline testing requirements. This was simply to transfer a directory
>>>>of
>>>>1GB with varying file sizes. For completeness I have gone so far as to
>>>>transfer files of 1MB each (This scenario might not be very probable
>>>>for
>>>>SKA though...). I have noticed a substantiation drop in the transfer
>>>>rate
>>>>achieved compared to the 100MB files as well as the transfer rate being
>>>>quite variable. What would be the main contributor for this? I see that
>>>>there is a metadata file created for each transfer which might perhaps
>>>>contribute to the overhead and become quite prominent in the 1000 x 1MB
>>>>file case. All these tests used the FTP protocol and were performed on
>>>>the same machine and network link:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>For testing single file transfer I found the maximum transfer rate only
>>>>being achieved for files > 256 MB:
>>>>
>>>>
>>>>
>>>>
>>>>I also monitored the transfer rate of a 8192 MB file which constantly
>>>>revealed an interesting behaviour of achieving a maximum transfer rate
>>>>after which the transfer rate then drops. I am also unsure what the
>>>>cause
>>>>for this might be as it happened constantly and in both transfer
>>>>directions:
>>>>
>>>>
>>>>
>>>>I would greatly appreciate your comments on this and it include it in
>>>>my
>>>>report before I submit it during next week.
>>>>
>>>>All the best!
>>>>
>>>>Cheers
>>>>Etienne
>>>>
>>>>
>>>>
>>>>
>>>>Etienne Koen
>>>>Data Processing Systems Engineer
>>>>
>>>>
>>>>
>>>>
>>>>Space Advisory Company
>>>>
>>>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: [email protected]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>________________________________
>>>
>>>Disclaimer: This E-mail message, including any attachments, is intended
>>>only for the person or entity to which it is addressed, and may contain
>>>confidential information. Each page attached hereto must also be read in
>>>conjunction with this disclaimer.
>>>If you are not the intended recipient you are hereby notified that any
>>>disclosure, copying, distribution or reliance upon the contents of this
>>>e-mail is strictly prohibited. E.&O.E.
>>>
>>>________________________________
>>>
>>>Disclaimer: This E-mail message, including any attachments, is intended
>>>only for the person or entity to which it is addressed, and may contain
>>>confidential information. Each page attached hereto must also be read in
>>>conjunction with this disclaimer.
>>>If you are not the intended recipient you are hereby notified that any
>>>disclosure, copying, distribution or reliance upon the contents of this
>>>e-mail is strictly prohibited. E.&O.E.
>>
>>
>>________________________________
>>
>>Disclaimer: This E-mail message, including any attachments, is intended
>>only for the person or entity to which it is addressed, and may contain
>>confidential information. Each page attached hereto must also be read in
>>conjunction with this disclaimer.
>>If you are not the intended recipient you are hereby notified that any
>>disclosure, copying, distribution or reliance upon the contents of this
>>e-mail is strictly prohibited. E.&O.E.
>>
>>________________________________
>>
>>Disclaimer: This E-mail message, including any attachments, is intended
>>only for the person or entity to which it is addressed, and may contain
>>confidential information. Each page attached hereto must also be read in
>>conjunction with this disclaimer.
>>If you are not the intended recipient you are hereby notified that any
>>disclosure, copying, distribution or reliance upon the contents of this
>>e-mail is strictly prohibited. E.&O.E.
>
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.

Reply via email to