Here is an abstract implementation that can work with filesystems that don't support append
https://github.com/chandnisingh/Malhar/blob/examples/library/src/main/java/com/datatorrent/lib/io/fs/AbstractNonAppendFileOutputOperator.java On Tue, Nov 3, 2015 at 9:45 AM, Chandni Singh <[email protected]> wrote: > Will do. > > On Tue, Nov 3, 2015 at 9:41 AM, Thomas Weise <[email protected]> > wrote: > >> Agreed, there will be be applications that write to many files that cannot >> be all remain open forever. >> >> Can you provide an example on how to modify the append behavior depending >> on HFS implementation? >> >> https://malhar.atlassian.net/browse/MLHR-1888 >> >> >> On Tue, Nov 3, 2015 at 9:35 AM, Chandni Singh <[email protected]> >> wrote: >> >> > Hi, >> > >> > Please look at the latest changes to this operator. >> > These changes enable overriding stream opening and closing. >> Implementation >> > can control how they want to achieve append() if at all. >> > >> > This operator from its conception is based on a cache of open streams >> which >> > has a maximum size which that if at any point of time that limit is >> near, >> > the cache will evict entries (close streams). Another setting is expiry >> > time which evicts and closes a stream when it hasn't been accessed for a >> > while in the cache. >> > >> > If the user wants to actually never close the stream they can initialize >> > both these values to their respective max values. But in an real case >> > scenario the user needs to know that when a file will be eventually >> closed >> > (never written to) and using that information they can configure these >> > settings or again initialize them to their max and close the streams >> > explicitly. >> > >> > Let's say if we don't have this cache and we are writing to multiple >> files. >> > Then that implies that multiple streams will always hang around in >> memory >> > (even if they weren't accessed) all the time. This in my opinion is a >> > problematic design which will cause bigger issues like out of memory all >> > the time. >> > >> > Chandni >> > >> > >> > On Tue, Nov 3, 2015 at 7:58 AM, Thomas Weise <[email protected]> >> > wrote: >> > >> > > Append is used to continue writing to files that were closed and left >> in >> > a >> > > consistent state before. When append is not available, then we would >> need >> > > to disable the optimization to close and reopen files? >> > > >> > > >> > > On Tue, Nov 3, 2015 at 6:14 AM, Munagala Ramanath < >> [email protected]> >> > > wrote: >> > > >> > > > Shouldn't "append" be a user-configurable property which, if false, >> > > causes >> > > > the >> > > > file to be overwritten ? >> > > > >> > > > Ram >> > > > >> > > > On Mon, Nov 2, 2015 at 10:51 PM, Priyanka Gugale >> > > > <[email protected]> wrote: >> > > > > Hi, >> > > > > >> > > > > AbstractFileOutputOperator is used to write output files. The >> > operator >> > > > has >> > > > > a method "getFSInstance". This initializes file system. One can >> > > override >> > > > > the method to initialize desired file system which extends hadoop >> > > > > FileSystem. In our implementation we have overridden >> "getFSInstance" >> > to >> > > > > initialize FTPFileSystem. >> > > > > >> > > > > The file loader code in setup method of AbstractFileOutputOperator >> > > opens >> > > > > the file in append mode when file is already present. The issue is >> > > > > FTPFileSystem doesn't support append function. >> > > > > >> > > > > The solution to problem could be: >> > > > > 1. Override append method in FTPFileSystem. >> > > > > -This would be tricky as file system doesn't support the >> > operation. >> > > > And >> > > > > there are other file systems as well like S3 which also don't >> support >> > > > > append. >> > > > > 2. Avoid using functions like "append" which are not supported by >> > some >> > > of >> > > > > the implementations of Hadoop FileSystem. >> > > > > 3. Write file loading logic (which is in setup method) in >> functions >> > > which >> > > > > can be extended by subclass to override the logic to load files >> (by >> > > > > avoiding using calls like append which are not supported by user's >> > > chosen >> > > > > file system). >> > > > > >> > > > > -Priyanka >> > > > >> > > >> > >> > >
