Will do.

On Tue, Nov 3, 2015 at 9:41 AM, Thomas Weise <[email protected]> wrote:

> Agreed, there will be be applications that write to many files that cannot
> be all remain open forever.
>
> Can you provide an example on how to modify the append behavior depending
> on HFS implementation?
>
> https://malhar.atlassian.net/browse/MLHR-1888
>
>
> On Tue, Nov 3, 2015 at 9:35 AM, Chandni Singh <[email protected]>
> wrote:
>
> > Hi,
> >
> > Please look at the latest changes to this operator.
> > These changes enable overriding stream opening and closing.
> Implementation
> > can control how they want to achieve append() if at all.
> >
> > This operator from its conception is based on a cache of open streams
> which
> > has a maximum size which that if at any point of time that limit is near,
> > the cache will evict entries (close streams). Another setting is expiry
> > time which evicts and closes a stream when it hasn't been accessed for a
> > while in the cache.
> >
> > If the user wants to actually never close the stream they can initialize
> > both these values to their respective max values. But in an real case
> > scenario the user needs to know that when a file will be eventually
> closed
> > (never written to) and using that information they can configure these
> > settings or again initialize them to their max and close the streams
> > explicitly.
> >
> > Let's say if we don't have this cache and we are writing to multiple
> files.
> > Then that implies that multiple streams will always hang around in memory
> > (even if they weren't accessed)  all the time. This in my opinion is a
> > problematic design which will cause bigger issues like out of memory all
> > the time.
> >
> > Chandni
> >
> >
> > On Tue, Nov 3, 2015 at 7:58 AM, Thomas Weise <[email protected]>
> > wrote:
> >
> > > Append is used to continue writing to files that were closed and left
> in
> > a
> > > consistent state before. When append is not available, then we would
> need
> > > to disable the optimization to close and reopen files?
> > >
> > >
> > > On Tue, Nov 3, 2015 at 6:14 AM, Munagala Ramanath <[email protected]
> >
> > > wrote:
> > >
> > > > Shouldn't "append" be a user-configurable property which, if false,
> > > causes
> > > > the
> > > > file to be overwritten ?
> > > >
> > > > Ram
> > > >
> > > > On Mon, Nov 2, 2015 at 10:51 PM, Priyanka Gugale
> > > > <[email protected]> wrote:
> > > > > Hi,
> > > > >
> > > > > AbstractFileOutputOperator is used to write output files. The
> > operator
> > > > has
> > > > > a method "getFSInstance". This initializes file system. One can
> > > override
> > > > > the method to initialize desired file system which extends hadoop
> > > > > FileSystem. In our implementation we have overridden
> "getFSInstance"
> > to
> > > > > initialize FTPFileSystem.
> > > > >
> > > > > The file loader code in setup method of AbstractFileOutputOperator
> > > opens
> > > > > the file in append mode when file is already present. The issue is
> > > > > FTPFileSystem doesn't support append function.
> > > > >
> > > > > The solution to problem could be:
> > > > > 1. Override append method in FTPFileSystem.
> > > > >     -This would be tricky as file system doesn't support the
> > operation.
> > > > And
> > > > > there are other file systems as well like S3 which also don't
> support
> > > > > append.
> > > > > 2. Avoid using functions like "append" which are not supported by
> > some
> > > of
> > > > > the implementations of Hadoop FileSystem.
> > > > > 3. Write file loading logic (which is in setup method) in functions
> > > which
> > > > > can be extended by subclass to override the logic to load files (by
> > > > > avoiding using calls like append which are not supported by user's
> > > chosen
> > > > > file system).
> > > > >
> > > > > -Priyanka
> > > >
> > >
> >
>

Reply via email to