Re: Possibility of saving checkpoints on other distributed filesystems

Pramod Immaneni Tue, 02 Feb 2016 09:21:48 -0800

Good idea to handle it in GW.

On Tue, Feb 2, 2016 at 8:50 AM, Thomas Weise <[email protected]> wrote:


> Exactly, this doesn't make sense. I filed an enhancement to have this in GW
> a while ago.
>
> On Tue, Feb 2, 2016 at 8:48 AM, Pramod Immaneni <[email protected]>
> wrote:
>
> > Yogi,
> >
> > kill is not an orderly shutdown, who will clean the state?
> >
> > On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <[email protected]>
> > wrote:
> >
> > > I would prefer to have an additional argument during application launch
> > on
> > > dtcli.
> > >
> > > Say, --preserve-kill-state true .
> > >
> > > Basically, platform should be able to do the clean-up activity if the
> > > application is invoked with certain flag.
> > >
> > > Test apps can set this flag to clear the data on kill. Production apps
> > can
> > > set this flag to keep the data on kill.
> > >
> > > Shutdown should always preserve the state. But, for kill /
> > forced-shutdown
> > > user might prefer to clear the state.
> > >
> > > ~ Yogi
> > >
> > > On 2 February 2016 at 21:53, Amol Kekre <[email protected]> wrote:
> > >
> > >>
> > >> Can we include a script in our github (util?) that simply deletes
> these
> > >> files upon application being killed, given an app-id. The admin will
> > need
> > >> to run this script. Auto-deleting will be bad as a lot of users,
> > including
> > >> those in production today need to restart using those files. The
> > >> knowledge/desire to restart post failure is outside the app and hence
> > >> technically the script should be explicitly user invoked
> > >>
> > >> Thks,
> > >> Amol
> > >>
> > >>
> > >> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <
> [email protected]
> > >
> > >> wrote:
> > >>
> > >>> Hi Venkat,
> > >>>
> > >>> There are typically a small number of outstanding checkpoint files
> per
> > >>> operator, as newer checkpoints are created old ones are automatically
> > >>> deleted by the application when it determines that state is no longer
> > >>> needed. When an application stops/killed the last checkpoints remain.
> > >>> There
> > >>> is also a benefit to that since a new application can be restarted to
> > >>> continue from those checkpoints instead of starting all the way from
> > the
> > >>> beginning and this is useful in some cases. But if you are always
> > >>> starting
> > >>> your application from scratch yes you can delete the checkpoints of
> > older
> > >>> applications that are no longer running.
> > >>>
> > >>> Thanks
> > >>>
> > >>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
> > >>> [email protected]> wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> >         Now that this has been discussed, Will the checkpointed
> data
> > be
> > >>> > purged when we kill the application forcefully?  In our current
> > usage,
> > >>> we
> > >>> > forcefully kill the app after it processes a certain batch of
> data. I
> > >>> see
> > >>> > these small files are created under (user/datatorrent) directory
> and
> > >>> not
> > >>> > removed.
> > >>> >
> > >>> >         Another scenario, when some of the containers keep failing,
> > we
> > >>> > have observed this state where the data is continuously
> checkpointed
> > >>> into
> > >>> > small files. When we kill the app, the data will be there.
> > >>> >
> > >>> >         We have received concerns saying this is impacting namenode
> > >>> > performance since these small files are stored in HDFS. So we
> > manually
> > >>> > remove these checkpointed data at regular intervals.
> > >>> >
> > >>> > -Venkatesh
> > >>> >
> > >>> > -----Original Message-----
> > >>> > From: Amol Kekre [mailto:[email protected]]
> > >>> > Sent: Monday, February 01, 2016 7:49 AM
> > >>> > To: [email protected]; [email protected]
> > >>> > Subject: Re: Possibility of saving checkpoints on other distributed
> > >>> > filesystems
> > >>> >
> > >>> > Aniruddha,
> > >>> > We have not heard this request from users yet. It may be because
> our
> > >>> > checkpointing has a purge, i.e. the small files are not left over.
> > >>> Small
> > >>> > file problem has been there in Hadoop and relates to storing small
> > >>> files in
> > >>> > Hadoop for a longer time (more likely forever).
> > >>> >
> > >>> > Thks,
> > >>> > Amol
> > >>> >
> > >>> >
> > >>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
> > >>> > [email protected]> wrote:
> > >>> >
> > >>> > > Hi Community,
> > >>> > >
> > >>> > > Or Let me say BigFoots, do you think this feature should be
> > >>> available?
> > >>> > >
> > >>> > > The reason to bring this up was discussed in the start of this
> > >>> thread as:
> > >>> > >
> > >>> > > This is with the intention to recover the applications faster and
> > do
> > >>> > > away
> > >>> > > > with HDFS's small files problem as described here:
> > >>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > >>> > > >
> > >>> > > >
> > >>> > >
> > >>>
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > >>> > > l-files-problem/
> > >>> > > >
> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > >>> > > > If we could save checkpoints in some other distributed file
> > system
> > >>> > > > (or even a HA NAS box) geared for small files, we could
> achieve -
> > >>> > > >
> > >>> > > >    - Better performance of NN & HDFS for the production usage
> > >>> (read:
> > >>> > > >    production data I/O & not temp files)
> > >>> > > >
> > >>> > > >
> > >>> > > >    - Faster application recovery in case of planned shutdown /
> > >>> > unplanned
> > >>> > > >    restarts
> > >>> > > >
> > >>> > > > If you feel the need of this feature, please cast your opinions
> > and
> > >>> > > > ideas
> > >>> > > so that it can be converted in a jira.
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > Thanks,
> > >>> > >
> > >>> > >
> > >>> > > Aniruddha
> > >>> > >
> > >>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
> > >>> > > <[email protected]>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > Aniruddha,
> > >>> > > >
> > >>> > > > Currently we don't have any support for that.
> > >>> > > >
> > >>> > > > Thanks
> > >>> > > > Gaurav
> > >>> > > >
> > >>> > > > Thanks
> > >>> > > > -Gaurav
> > >>> > > >
> > >>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
> > >>> > > > <[email protected]>
> > >>> > > > wrote:
> > >>> > > >
> > >>> > > > > Default FSStorageAgent can be used as it can work with local
> > >>> > > filesystem,
> > >>> > > > > but I far as I know there is no support for specifying the
> > >>> > > > > directory through xml file. by default it use the application
> > >>> > directory on HDFS.
> > >>> > > > >
> > >>> > > > > Not sure If we could specify storage agent with its
> properties
> > >>> > > > > through
> > >>> > > > the
> > >>> > > > > configuration at dag level.
> > >>> > > > >
> > >>> > > > > - Tushar.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > >>> > > > > [email protected]> wrote:
> > >>> > > > >
> > >>> > > > > > Hi,
> > >>> > > > > >
> > >>> > > > > > Do we have any storage agent which I can use readily,
> > >>> > > > > > configurable
> > >>> > > > > through
> > >>> > > > > > dt-site.xml?
> > >>> > > > > >
> > >>> > > > > > I am looking for something which would save checkpoints in
> > >>> > > > > > mounted
> > >>> > > file
> > >>> > > > > > system [eg. HA-NAS] which is basically just another
> directory
> > >>> > > > > > for
> > >>> > > Apex.
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > > Thanks,
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > > Aniruddha
> > >>> > > > > >
> > >>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > >>> > > > [email protected]>
> > >>> > > > > > wrote:
> > >>> > > > > >
> > >>> > > > > > > It is already supported refer the following jira for more
> > >>> > > > information,
> > >>> > > > > > >
> > >>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > >>> > > > > > > [email protected]> wrote:
> > >>> > > > > > >
> > >>> > > > > > > > Hi,
> > >>> > > > > > > >
> > >>> > > > > > > > Is it possible to save checkpoints in any other highly
> > >>> > > > > > > > available distributed file systems (which maybe mounted
> > >>> > > > > > > > directories across
> > >>> > > > the
> > >>> > > > > > > > cluster) other than HDFS?
> > >>> > > > > > > > If yes, is it configurable?
> > >>> > > > > > > >
> > >>> > > > > > > > AFAIK, there is no configurable option available to
> > achieve
> > >>> > that.
> > >>> > > > > > > > If that's the case, can we have that feature?
> > >>> > > > > > > >
> > >>> > > > > > > > This is with the intention to recover the applications
> > >>> > > > > > > > faster and
> > >>> > > > do
> > >>> > > > > > away
> > >>> > > > > > > > with HDFS's small files problem as described here:
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
> > >>> > > > > > > > m/
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>>
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > >>> > > l-files-problem/
> > >>> > > > > > > >
> > >>> > > >
> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > >>> > > > > > > >
> > >>> > > > > > > > If we could save checkpoints in some other distributed
> > file
> > >>> > > system
> > >>> > > > > (or
> > >>> > > > > > > even
> > >>> > > > > > > > a HA NAS box) geared for small files, we could achieve
> -
> > >>> > > > > > > >
> > >>> > > > > > > >    - Better performance of NN & HDFS for the production
> > >>> > > > > > > > usage
> > >>> > > > (read:
> > >>> > > > > > > >    production data I/O & not temp files)
> > >>> > > > > > > >    - Faster application recovery in case of planned
> > >>> shutdown
> > >>> > > > > > > > /
> > >>> > > > > > unplanned
> > >>> > > > > > > >    restarts
> > >>> > > > > > > >
> > >>> > > > > > > > Please, send your comments, suggestions or ideas.
> > >>> > > > > > > >
> > >>> > > > > > > > Thanks,
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > Aniruddha
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Reply via email to