Re: Possibility of saving checkpoints on other distributed filesystems

Amol Kekre Tue, 02 Feb 2016 08:24:31 -0800

Can we include a script in our github (util?) that simply deletes these
files upon application being killed, given an app-id. The admin will need
to run this script. Auto-deleting will be bad as a lot of users, including
those in production today need to restart using those files. The
knowledge/desire to restart post failure is outside the app and hence
technically the script should be explicitly user invoked


Thks,
Amol


On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <[email protected]>
wrote:

> Hi Venkat,
>
> There are typically a small number of outstanding checkpoint files per
> operator, as newer checkpoints are created old ones are automatically
> deleted by the application when it determines that state is no longer
> needed. When an application stops/killed the last checkpoints remain. There
> is also a benefit to that since a new application can be restarted to
> continue from those checkpoints instead of starting all the way from the
> beginning and this is useful in some cases. But if you are always starting
> your application from scratch yes you can delete the checkpoints of older
> applications that are no longer running.
>
> Thanks
>
> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
> [email protected]> wrote:
>
> > Hi,
> >
> >         Now that this has been discussed, Will the checkpointed data be
> > purged when we kill the application forcefully?  In our current usage, we
> > forcefully kill the app after it processes a certain batch of data. I see
> > these small files are created under (user/datatorrent) directory and not
> > removed.
> >
> >         Another scenario, when some of the containers keep failing, we
> > have observed this state where the data is continuously checkpointed into
> > small files. When we kill the app, the data will be there.
> >
> >         We have received concerns saying this is impacting namenode
> > performance since these small files are stored in HDFS. So we manually
> > remove these checkpointed data at regular intervals.
> >
> > -Venkatesh
> >
> > -----Original Message-----
> > From: Amol Kekre [mailto:[email protected]]
> > Sent: Monday, February 01, 2016 7:49 AM
> > To: [email protected]; [email protected]
> > Subject: Re: Possibility of saving checkpoints on other distributed
> > filesystems
> >
> > Aniruddha,
> > We have not heard this request from users yet. It may be because our
> > checkpointing has a purge, i.e. the small files are not left over. Small
> > file problem has been there in Hadoop and relates to storing small files
> in
> > Hadoop for a longer time (more likely forever).
> >
> > Thks,
> > Amol
> >
> >
> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
> > [email protected]> wrote:
> >
> > > Hi Community,
> > >
> > > Or Let me say BigFoots, do you think this feature should be available?
> > >
> > > The reason to bring this up was discussed in the start of this thread
> as:
> > >
> > > This is with the intention to recover the applications faster and do
> > > away
> > > > with HDFS's small files problem as described here:
> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > > >
> > > >
> > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > > l-files-problem/
> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > If we could save checkpoints in some other distributed file system
> > > > (or even a HA NAS box) geared for small files, we could achieve -
> > > >
> > > >    - Better performance of NN & HDFS for the production usage (read:
> > > >    production data I/O & not temp files)
> > > >
> > > >
> > > >    - Faster application recovery in case of planned shutdown /
> > unplanned
> > > >    restarts
> > > >
> > > > If you feel the need of this feature, please cast your opinions and
> > > > ideas
> > > so that it can be converted in a jira.
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Aniruddha
> > >
> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
> > > <[email protected]>
> > > wrote:
> > >
> > > > Aniruddha,
> > > >
> > > > Currently we don't have any support for that.
> > > >
> > > > Thanks
> > > > Gaurav
> > > >
> > > > Thanks
> > > > -Gaurav
> > > >
> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
> > > > <[email protected]>
> > > > wrote:
> > > >
> > > > > Default FSStorageAgent can be used as it can work with local
> > > filesystem,
> > > > > but I far as I know there is no support for specifying the
> > > > > directory through xml file. by default it use the application
> > directory on HDFS.
> > > > >
> > > > > Not sure If we could specify storage agent with its properties
> > > > > through
> > > > the
> > > > > configuration at dag level.
> > > > >
> > > > > - Tushar.
> > > > >
> > > > >
> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Do we have any storage agent which I can use readily,
> > > > > > configurable
> > > > > through
> > > > > > dt-site.xml?
> > > > > >
> > > > > > I am looking for something which would save checkpoints in
> > > > > > mounted
> > > file
> > > > > > system [eg. HA-NAS] which is basically just another directory
> > > > > > for
> > > Apex.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > Aniruddha
> > > > > >
> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > It is already supported refer the following jira for more
> > > > information,
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > > > > > > [email protected]> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Is it possible to save checkpoints in any other highly
> > > > > > > > available distributed file systems (which maybe mounted
> > > > > > > > directories across
> > > > the
> > > > > > > > cluster) other than HDFS?
> > > > > > > > If yes, is it configurable?
> > > > > > > >
> > > > > > > > AFAIK, there is no configurable option available to achieve
> > that.
> > > > > > > > If that's the case, can we have that feature?
> > > > > > > >
> > > > > > > > This is with the intention to recover the applications
> > > > > > > > faster and
> > > > do
> > > > > > away
> > > > > > > > with HDFS's small files problem as described here:
> > > > > > > >
> > > > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-proble
> > > > > > > > m/
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
> > > l-files-problem/
> > > > > > > >
> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > > > > >
> > > > > > > > If we could save checkpoints in some other distributed file
> > > system
> > > > > (or
> > > > > > > even
> > > > > > > > a HA NAS box) geared for small files, we could achieve -
> > > > > > > >
> > > > > > > >    - Better performance of NN & HDFS for the production
> > > > > > > > usage
> > > > (read:
> > > > > > > >    production data I/O & not temp files)
> > > > > > > >    - Faster application recovery in case of planned shutdown
> > > > > > > > /
> > > > > > unplanned
> > > > > > > >    restarts
> > > > > > > >
> > > > > > > > Please, send your comments, suggestions or ideas.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > >
> > > > > > > > Aniruddha
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Possibility of saving checkpoints on other distributed filesystems

Reply via email to