Will do. On Tue, Nov 3, 2015 at 9:41 AM, Thomas Weise <[email protected]> wrote:
> Agreed, there will be be applications that write to many files that cannot > be all remain open forever. > > Can you provide an example on how to modify the append behavior depending > on HFS implementation? > > https://malhar.atlassian.net/browse/MLHR-1888 > > > On Tue, Nov 3, 2015 at 9:35 AM, Chandni Singh <[email protected]> > wrote: > > > Hi, > > > > Please look at the latest changes to this operator. > > These changes enable overriding stream opening and closing. > Implementation > > can control how they want to achieve append() if at all. > > > > This operator from its conception is based on a cache of open streams > which > > has a maximum size which that if at any point of time that limit is near, > > the cache will evict entries (close streams). Another setting is expiry > > time which evicts and closes a stream when it hasn't been accessed for a > > while in the cache. > > > > If the user wants to actually never close the stream they can initialize > > both these values to their respective max values. But in an real case > > scenario the user needs to know that when a file will be eventually > closed > > (never written to) and using that information they can configure these > > settings or again initialize them to their max and close the streams > > explicitly. > > > > Let's say if we don't have this cache and we are writing to multiple > files. > > Then that implies that multiple streams will always hang around in memory > > (even if they weren't accessed) all the time. This in my opinion is a > > problematic design which will cause bigger issues like out of memory all > > the time. > > > > Chandni > > > > > > On Tue, Nov 3, 2015 at 7:58 AM, Thomas Weise <[email protected]> > > wrote: > > > > > Append is used to continue writing to files that were closed and left > in > > a > > > consistent state before. When append is not available, then we would > need > > > to disable the optimization to close and reopen files? > > > > > > > > > On Tue, Nov 3, 2015 at 6:14 AM, Munagala Ramanath <[email protected] > > > > > wrote: > > > > > > > Shouldn't "append" be a user-configurable property which, if false, > > > causes > > > > the > > > > file to be overwritten ? > > > > > > > > Ram > > > > > > > > On Mon, Nov 2, 2015 at 10:51 PM, Priyanka Gugale > > > > <[email protected]> wrote: > > > > > Hi, > > > > > > > > > > AbstractFileOutputOperator is used to write output files. The > > operator > > > > has > > > > > a method "getFSInstance". This initializes file system. One can > > > override > > > > > the method to initialize desired file system which extends hadoop > > > > > FileSystem. In our implementation we have overridden > "getFSInstance" > > to > > > > > initialize FTPFileSystem. > > > > > > > > > > The file loader code in setup method of AbstractFileOutputOperator > > > opens > > > > > the file in append mode when file is already present. The issue is > > > > > FTPFileSystem doesn't support append function. > > > > > > > > > > The solution to problem could be: > > > > > 1. Override append method in FTPFileSystem. > > > > > -This would be tricky as file system doesn't support the > > operation. > > > > And > > > > > there are other file systems as well like S3 which also don't > support > > > > > append. > > > > > 2. Avoid using functions like "append" which are not supported by > > some > > > of > > > > > the implementations of Hadoop FileSystem. > > > > > 3. Write file loading logic (which is in setup method) in functions > > > which > > > > > can be extended by subclass to override the logic to load files (by > > > > > avoiding using calls like append which are not supported by user's > > > chosen > > > > > file system). > > > > > > > > > > -Priyanka > > > > > > > > > >
