Hi, > The -update behavior is by design.
If I am right, -update is to overwrite the file at the destination if it is already there. But, in this case it is overwriting the folder as a file at destination which seems to be a bug > > Could you provide the command line, and the directory structure before > and after issuing the copy? -C Cmd is: hadoop distcp -update 'hftp://<srchost>:50070/user/<user>/distcpsrc' distcp_dest hadoop dfs -lsr distcpsrc /user/<user>/distcpsrc/1 <dir> 2008-07-24 05:53 /user/<user>/distcpsrc/1/t <r 3> 4 2008-07-22 06:12 hadoop dfs -lsr distcp_dest /user/<user>/distcp_dest/1 <r 3> 4 2008-07-24 06:03 << expected /user/<user>/distcp_dest/1/t, file is copied as '1' instead of '1/t' If I run without '-update', destination dir is: hadoop dfs -lsr distcp_dest_noupdate /user/<user>/distcp_dest_noupdate/1 <dir> 2008-07-24 06:08 << file 't' is not copied and '1' is directory Thanks, Murali > > On Jul 22, 2008, at 9:46 PM, Murali Krishna wrote: > > > Hi, > > I am using 0.15.3 and the destination is empty. One more > > behavior that I am seeing is that if I pass '-update' option, it is > > writing the content of file '2' in folder 1. (Makes the folder '1' as > > file in the destination). So, look like it is treating the destination > > for file distcpsrc/1/2 as distcpdest/1. > > > > Thanks, > > Murali > > > >> -----Original Message----- > >> From: Chris Douglas [mailto:[EMAIL PROTECTED] > >> Sent: Wednesday, July 23, 2008 1:13 AM > >> To: [email protected] > >> Subject: Re: distcp skipping the file > >> > >> There were many fixes and improvements to distcp in 0.16, but most of > >> the critical fixes made it into 0.15.2 and 0.15.3. Is the destination > >> empty? Anything already existing at the destination is skipped. -C > >> > >> On Jul 22, 2008, at 4:39 AM, Murali Krishna wrote: > >> > >>> Hi, > >>> > >>> My source folder has a single folder and a single file inside that. > >>> > >>> /user/<user>/distcpsrc/1/2 <r 3> 4 2008-07-22 04:22 > >>> > >>> In the destination, it is creating the folder '1' but not the file > >>> '2'. > >>> > >>> The counters show 1 file has been skipped. > >>> > >>> 08/07/22 04:22:36 INFO mapred.JobClient: Files skipped=1 > >>> > >>> > >>> > >>> If I create one more file in any directory under the distscpsrc > >>> folder, > >>> it copies both the files properly. Is this a bug? > >>> > >>> [I am using 15.3] > >>> > >>> > >>> > >>> Thanks, > >>> > >>> Murali > >>> > >
