Append is used to continue writing to files that were closed and left in a consistent state before. When append is not available, then we would need to disable the optimization to close and reopen files?
On Tue, Nov 3, 2015 at 6:14 AM, Munagala Ramanath <[email protected]> wrote: > Shouldn't "append" be a user-configurable property which, if false, causes > the > file to be overwritten ? > > Ram > > On Mon, Nov 2, 2015 at 10:51 PM, Priyanka Gugale > <[email protected]> wrote: > > Hi, > > > > AbstractFileOutputOperator is used to write output files. The operator > has > > a method "getFSInstance". This initializes file system. One can override > > the method to initialize desired file system which extends hadoop > > FileSystem. In our implementation we have overridden "getFSInstance" to > > initialize FTPFileSystem. > > > > The file loader code in setup method of AbstractFileOutputOperator opens > > the file in append mode when file is already present. The issue is > > FTPFileSystem doesn't support append function. > > > > The solution to problem could be: > > 1. Override append method in FTPFileSystem. > > -This would be tricky as file system doesn't support the operation. > And > > there are other file systems as well like S3 which also don't support > > append. > > 2. Avoid using functions like "append" which are not supported by some of > > the implementations of Hadoop FileSystem. > > 3. Write file loading logic (which is in setup method) in functions which > > can be extended by subclass to override the logic to load files (by > > avoiding using calls like append which are not supported by user's chosen > > file system). > > > > -Priyanka >
