Yogi, kill is not an orderly shutdown, who will clean the state?
On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <[email protected]> wrote: > I would prefer to have an additional argument during application launch on > dtcli. > > Say, --preserve-kill-state true . > > Basically, platform should be able to do the clean-up activity if the > application is invoked with certain flag. > > Test apps can set this flag to clear the data on kill. Production apps can > set this flag to keep the data on kill. > > Shutdown should always preserve the state. But, for kill / forced-shutdown > user might prefer to clear the state. > > ~ Yogi > > On 2 February 2016 at 21:53, Amol Kekre <[email protected]> wrote: > >> >> Can we include a script in our github (util?) that simply deletes these >> files upon application being killed, given an app-id. The admin will need >> to run this script. Auto-deleting will be bad as a lot of users, including >> those in production today need to restart using those files. The >> knowledge/desire to restart post failure is outside the app and hence >> technically the script should be explicitly user invoked >> >> Thks, >> Amol >> >> >> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <[email protected]> >> wrote: >> >>> Hi Venkat, >>> >>> There are typically a small number of outstanding checkpoint files per >>> operator, as newer checkpoints are created old ones are automatically >>> deleted by the application when it determines that state is no longer >>> needed. When an application stops/killed the last checkpoints remain. >>> There >>> is also a benefit to that since a new application can be restarted to >>> continue from those checkpoints instead of starting all the way from the >>> beginning and this is useful in some cases. But if you are always >>> starting >>> your application from scratch yes you can delete the checkpoints of older >>> applications that are no longer running. >>> >>> Thanks >>> >>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh < >>> [email protected]> wrote: >>> >>> > Hi, >>> > >>> > Now that this has been discussed, Will the checkpointed data be >>> > purged when we kill the application forcefully? In our current usage, >>> we >>> > forcefully kill the app after it processes a certain batch of data. I >>> see >>> > these small files are created under (user/datatorrent) directory and >>> not >>> > removed. >>> > >>> > Another scenario, when some of the containers keep failing, we >>> > have observed this state where the data is continuously checkpointed >>> into >>> > small files. When we kill the app, the data will be there. >>> > >>> > We have received concerns saying this is impacting namenode >>> > performance since these small files are stored in HDFS. So we manually >>> > remove these checkpointed data at regular intervals. >>> > >>> > -Venkatesh >>> > >>> > -----Original Message----- >>> > From: Amol Kekre [mailto:[email protected]] >>> > Sent: Monday, February 01, 2016 7:49 AM >>> > To: [email protected]; [email protected] >>> > Subject: Re: Possibility of saving checkpoints on other distributed >>> > filesystems >>> > >>> > Aniruddha, >>> > We have not heard this request from users yet. It may be because our >>> > checkpointing has a purge, i.e. the small files are not left over. >>> Small >>> > file problem has been there in Hadoop and relates to storing small >>> files in >>> > Hadoop for a longer time (more likely forever). >>> > >>> > Thks, >>> > Amol >>> > >>> > >>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare < >>> > [email protected]> wrote: >>> > >>> > > Hi Community, >>> > > >>> > > Or Let me say BigFoots, do you think this feature should be >>> available? >>> > > >>> > > The reason to bring this up was discussed in the start of this >>> thread as: >>> > > >>> > > This is with the intention to recover the applications faster and do >>> > > away >>> > > > with HDFS's small files problem as described here: >>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ >>> > > > >>> > > > >>> > > >>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal >>> > > l-files-problem/ >>> > > > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1 >>> > > > If we could save checkpoints in some other distributed file system >>> > > > (or even a HA NAS box) geared for small files, we could achieve - >>> > > > >>> > > > - Better performance of NN & HDFS for the production usage >>> (read: >>> > > > production data I/O & not temp files) >>> > > > >>> > > > >>> > > > - Faster application recovery in case of planned shutdown / >>> > unplanned >>> > > > restarts >>> > > > >>> > > > If you feel the need of this feature, please cast your opinions and >>> > > > ideas >>> > > so that it can be converted in a jira. >>> > > >>> > > >>> > > >>> > > Thanks, >>> > > >>> > > >>> > > Aniruddha >>> > > >>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta >>> > > <[email protected]> >>> > > wrote: >>> > > >>> > > > Aniruddha, >>> > > > >>> > > > Currently we don't have any support for that. >>> > > > >>> > > > Thanks >>> > > > Gaurav >>> > > > >>> > > > Thanks >>> > > > -Gaurav >>> > > > >>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi >>> > > > <[email protected]> >>> > > > wrote: >>> > > > >>> > > > > Default FSStorageAgent can be used as it can work with local >>> > > filesystem, >>> > > > > but I far as I know there is no support for specifying the >>> > > > > directory through xml file. by default it use the application >>> > directory on HDFS. >>> > > > > >>> > > > > Not sure If we could specify storage agent with its properties >>> > > > > through >>> > > > the >>> > > > > configuration at dag level. >>> > > > > >>> > > > > - Tushar. >>> > > > > >>> > > > > >>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare < >>> > > > > [email protected]> wrote: >>> > > > > >>> > > > > > Hi, >>> > > > > > >>> > > > > > Do we have any storage agent which I can use readily, >>> > > > > > configurable >>> > > > > through >>> > > > > > dt-site.xml? >>> > > > > > >>> > > > > > I am looking for something which would save checkpoints in >>> > > > > > mounted >>> > > file >>> > > > > > system [eg. HA-NAS] which is basically just another directory >>> > > > > > for >>> > > Apex. >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > Thanks, >>> > > > > > >>> > > > > > >>> > > > > > Aniruddha >>> > > > > > >>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde < >>> > > > [email protected]> >>> > > > > > wrote: >>> > > > > > >>> > > > > > > It is already supported refer the following jira for more >>> > > > information, >>> > > > > > > >>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283 >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare < >>> > > > > > > [email protected]> wrote: >>> > > > > > > >>> > > > > > > > Hi, >>> > > > > > > > >>> > > > > > > > Is it possible to save checkpoints in any other highly >>> > > > > > > > available distributed file systems (which maybe mounted >>> > > > > > > > directories across >>> > > > the >>> > > > > > > > cluster) other than HDFS? >>> > > > > > > > If yes, is it configurable? >>> > > > > > > > >>> > > > > > > > AFAIK, there is no configurable option available to achieve >>> > that. >>> > > > > > > > If that's the case, can we have that feature? >>> > > > > > > > >>> > > > > > > > This is with the intention to recover the applications >>> > > > > > > > faster and >>> > > > do >>> > > > > > away >>> > > > > > > > with HDFS's small files problem as described here: >>> > > > > > > > >>> > > > > > > > >>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble >>> > > > > > > > m/ >>> > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal >>> > > l-files-problem/ >>> > > > > > > > >>> > > > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1 >>> > > > > > > > >>> > > > > > > > If we could save checkpoints in some other distributed file >>> > > system >>> > > > > (or >>> > > > > > > even >>> > > > > > > > a HA NAS box) geared for small files, we could achieve - >>> > > > > > > > >>> > > > > > > > - Better performance of NN & HDFS for the production >>> > > > > > > > usage >>> > > > (read: >>> > > > > > > > production data I/O & not temp files) >>> > > > > > > > - Faster application recovery in case of planned >>> shutdown >>> > > > > > > > / >>> > > > > > unplanned >>> > > > > > > > restarts >>> > > > > > > > >>> > > > > > > > Please, send your comments, suggestions or ideas. >>> > > > > > > > >>> > > > > > > > Thanks, >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > Aniruddha >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >> >
