Actually, PhedEx is using GridFTP for its data transferring. On Thu, Jan 13, 2011 at 5:34 AM, Steve Loughran <ste...@apache.org> wrote:
> On 13/01/11 08:34, li ping wrote: > >> That is also my concerns. Is it efficient for data transmission. >> > > It's long lived TCP connections, reasonably efficient for bulk data xfer, > has all the throttling of TCP built in, and comes with some excellently > debugged client and server code in the form of jetty and httpclient. In > maintenance costs alone, those libraries justify HTTP unless you have a > vastly superior option *and are willing to maintain it forever* > > FTPs limits are well known (security), NFS limits well known (security, UDP > version doesn't throttle), self developed protocols will have whatever > problems you want. > > There are better protocols for long-haul data transfer over fat pipes, such > as GridFTP , PhedEX ( http://www.gridpp.ac.uk/papers/ah05_phedex.pdf ), > which use multiple TCP channels in parallel to reduce the impact of a single > lost packet, but within a datacentre, you shouldn't have to worry about > this. If you do find lots of packets get lost, raise the issue with the > networking team. > > -Steve > > > >> On Thu, Jan 13, 2011 at 4:27 PM, Nan Zhu<zhunans...@gmail.com> wrote: >> >> Hi, all >>> >>> I have a question about the file transmission between Map and Reduce >>> stage, >>> in current implementation, the Reducers get the results generated by >>> Mappers >>> through HTTP Get, I don't understand why HTTP is selected, why not FTP, >>> or >>> a >>> self-developed protocal? >>> >>> Just for HTTP's simple? >>> >>> thanks >>> >>> Nan >>> >>> >> >> >> >