Append is used to continue writing to files that were closed and left in a
consistent state before. When append is not available, then we would need
to disable the optimization to close and reopen files?


On Tue, Nov 3, 2015 at 6:14 AM, Munagala Ramanath <[email protected]>
wrote:

> Shouldn't "append" be a user-configurable property which, if false, causes
> the
> file to be overwritten ?
>
> Ram
>
> On Mon, Nov 2, 2015 at 10:51 PM, Priyanka Gugale
> <[email protected]> wrote:
> > Hi,
> >
> > AbstractFileOutputOperator is used to write output files. The operator
> has
> > a method "getFSInstance". This initializes file system. One can override
> > the method to initialize desired file system which extends hadoop
> > FileSystem. In our implementation we have overridden "getFSInstance" to
> > initialize FTPFileSystem.
> >
> > The file loader code in setup method of AbstractFileOutputOperator opens
> > the file in append mode when file is already present. The issue is
> > FTPFileSystem doesn't support append function.
> >
> > The solution to problem could be:
> > 1. Override append method in FTPFileSystem.
> >     -This would be tricky as file system doesn't support the operation.
> And
> > there are other file systems as well like S3 which also don't support
> > append.
> > 2. Avoid using functions like "append" which are not supported by some of
> > the implementations of Hadoop FileSystem.
> > 3. Write file loading logic (which is in setup method) in functions which
> > can be extended by subclass to override the logic to load files (by
> > avoiding using calls like append which are not supported by user's chosen
> > file system).
> >
> > -Priyanka
>

Reply via email to