Can we include a script in our github (util?) that simply deletes these files upon application being killed, given an app-id. The admin will need to run this script. Auto-deleting will be bad as a lot of users, including those in production today need to restart using those files. The knowledge/desire to restart post failure is outside the app and hence technically the script should be explicitly user invoked
Thks, Amol On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <[email protected]> wrote: > Hi Venkat, > > There are typically a small number of outstanding checkpoint files per > operator, as newer checkpoints are created old ones are automatically > deleted by the application when it determines that state is no longer > needed. When an application stops/killed the last checkpoints remain. There > is also a benefit to that since a new application can be restarted to > continue from those checkpoints instead of starting all the way from the > beginning and this is useful in some cases. But if you are always starting > your application from scratch yes you can delete the checkpoints of older > applications that are no longer running. > > Thanks > > On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh < > [email protected]> wrote: > > > Hi, > > > > Now that this has been discussed, Will the checkpointed data be > > purged when we kill the application forcefully? In our current usage, we > > forcefully kill the app after it processes a certain batch of data. I see > > these small files are created under (user/datatorrent) directory and not > > removed. > > > > Another scenario, when some of the containers keep failing, we > > have observed this state where the data is continuously checkpointed into > > small files. When we kill the app, the data will be there. > > > > We have received concerns saying this is impacting namenode > > performance since these small files are stored in HDFS. So we manually > > remove these checkpointed data at regular intervals. > > > > -Venkatesh > > > > -----Original Message----- > > From: Amol Kekre [mailto:[email protected]] > > Sent: Monday, February 01, 2016 7:49 AM > > To: [email protected]; [email protected] > > Subject: Re: Possibility of saving checkpoints on other distributed > > filesystems > > > > Aniruddha, > > We have not heard this request from users yet. It may be because our > > checkpointing has a purge, i.e. the small files are not left over. Small > > file problem has been there in Hadoop and relates to storing small files > in > > Hadoop for a longer time (more likely forever). > > > > Thks, > > Amol > > > > > > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare < > > [email protected]> wrote: > > > > > Hi Community, > > > > > > Or Let me say BigFoots, do you think this feature should be available? > > > > > > The reason to bring this up was discussed in the start of this thread > as: > > > > > > This is with the intention to recover the applications faster and do > > > away > > > > with HDFS's small files problem as described here: > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ > > > > > > > > > > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal > > > l-files-problem/ > > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1 > > > > If we could save checkpoints in some other distributed file system > > > > (or even a HA NAS box) geared for small files, we could achieve - > > > > > > > > - Better performance of NN & HDFS for the production usage (read: > > > > production data I/O & not temp files) > > > > > > > > > > > > - Faster application recovery in case of planned shutdown / > > unplanned > > > > restarts > > > > > > > > If you feel the need of this feature, please cast your opinions and > > > > ideas > > > so that it can be converted in a jira. > > > > > > > > > > > > Thanks, > > > > > > > > > Aniruddha > > > > > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta > > > <[email protected]> > > > wrote: > > > > > > > Aniruddha, > > > > > > > > Currently we don't have any support for that. > > > > > > > > Thanks > > > > Gaurav > > > > > > > > Thanks > > > > -Gaurav > > > > > > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi > > > > <[email protected]> > > > > wrote: > > > > > > > > > Default FSStorageAgent can be used as it can work with local > > > filesystem, > > > > > but I far as I know there is no support for specifying the > > > > > directory through xml file. by default it use the application > > directory on HDFS. > > > > > > > > > > Not sure If we could specify storage agent with its properties > > > > > through > > > > the > > > > > configuration at dag level. > > > > > > > > > > - Tushar. > > > > > > > > > > > > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare < > > > > > [email protected]> wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > Do we have any storage agent which I can use readily, > > > > > > configurable > > > > > through > > > > > > dt-site.xml? > > > > > > > > > > > > I am looking for something which would save checkpoints in > > > > > > mounted > > > file > > > > > > system [eg. HA-NAS] which is basically just another directory > > > > > > for > > > Apex. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Aniruddha > > > > > > > > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde < > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > It is already supported refer the following jira for more > > > > information, > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283 > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > Is it possible to save checkpoints in any other highly > > > > > > > > available distributed file systems (which maybe mounted > > > > > > > > directories across > > > > the > > > > > > > > cluster) other than HDFS? > > > > > > > > If yes, is it configurable? > > > > > > > > > > > > > > > > AFAIK, there is no configurable option available to achieve > > that. > > > > > > > > If that's the case, can we have that feature? > > > > > > > > > > > > > > > > This is with the intention to recover the applications > > > > > > > > faster and > > > > do > > > > > > away > > > > > > > > with HDFS's small files problem as described here: > > > > > > > > > > > > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-proble > > > > > > > > m/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal > > > l-files-problem/ > > > > > > > > > > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1 > > > > > > > > > > > > > > > > If we could save checkpoints in some other distributed file > > > system > > > > > (or > > > > > > > even > > > > > > > > a HA NAS box) geared for small files, we could achieve - > > > > > > > > > > > > > > > > - Better performance of NN & HDFS for the production > > > > > > > > usage > > > > (read: > > > > > > > > production data I/O & not temp files) > > > > > > > > - Faster application recovery in case of planned shutdown > > > > > > > > / > > > > > > unplanned > > > > > > > > restarts > > > > > > > > > > > > > > > > Please, send your comments, suggestions or ideas. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > Aniruddha > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
