Re: Possibility of saving checkpoints on other distributed filesystems

Sandesh Hegde Tue, 02 Feb 2016 10:39:19 -0800

What is GW?

On Tue, Feb 2, 2016 at 9:16 AM Pramod Immaneni <[email protected]>
wrote:


> Good idea to handle it in GW.
>
> On Tue, Feb 2, 2016 at 8:50 AM, Thomas Weise <[email protected]>
> wrote:
>
>> Exactly, this doesn't make sense. I filed an enhancement to have this in
>> GW
>> a while ago.
>>
>> On Tue, Feb 2, 2016 at 8:48 AM, Pramod Immaneni <[email protected]>
>> wrote:
>>
>> > Yogi,
>> >
>> > kill is not an orderly shutdown, who will clean the state?
>> >
>> > On Tue, Feb 2, 2016 at 8:38 AM, Yogi Devendra <[email protected]>
>> > wrote:
>> >
>> > > I would prefer to have an additional argument during application
>> launch
>> > on
>> > > dtcli.
>> > >
>> > > Say, --preserve-kill-state true .
>> > >
>> > > Basically, platform should be able to do the clean-up activity if the
>> > > application is invoked with certain flag.
>> > >
>> > > Test apps can set this flag to clear the data on kill. Production apps
>> > can
>> > > set this flag to keep the data on kill.
>> > >
>> > > Shutdown should always preserve the state. But, for kill /
>> > forced-shutdown
>> > > user might prefer to clear the state.
>> > >
>> > > ~ Yogi
>> > >
>> > > On 2 February 2016 at 21:53, Amol Kekre <[email protected]> wrote:
>> > >
>> > >>
>> > >> Can we include a script in our github (util?) that simply deletes
>> these
>> > >> files upon application being killed, given an app-id. The admin will
>> > need
>> > >> to run this script. Auto-deleting will be bad as a lot of users,
>> > including
>> > >> those in production today need to restart using those files. The
>> > >> knowledge/desire to restart post failure is outside the app and hence
>> > >> technically the script should be explicitly user invoked
>> > >>
>> > >> Thks,
>> > >> Amol
>> > >>
>> > >>
>> > >> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <
>> [email protected]
>> > >
>> > >> wrote:
>> > >>
>> > >>> Hi Venkat,
>> > >>>
>> > >>> There are typically a small number of outstanding checkpoint files
>> per
>> > >>> operator, as newer checkpoints are created old ones are
>> automatically
>> > >>> deleted by the application when it determines that state is no
>> longer
>> > >>> needed. When an application stops/killed the last checkpoints
>> remain.
>> > >>> There
>> > >>> is also a benefit to that since a new application can be restarted
>> to
>> > >>> continue from those checkpoints instead of starting all the way from
>> > the
>> > >>> beginning and this is useful in some cases. But if you are always
>> > >>> starting
>> > >>> your application from scratch yes you can delete the checkpoints of
>> > older
>> > >>> applications that are no longer running.
>> > >>>
>> > >>> Thanks
>> > >>>
>> > >>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>> > >>> [email protected]> wrote:
>> > >>>
>> > >>> > Hi,
>> > >>> >
>> > >>> >         Now that this has been discussed, Will the checkpointed
>> data
>> > be
>> > >>> > purged when we kill the application forcefully?  In our current
>> > usage,
>> > >>> we
>> > >>> > forcefully kill the app after it processes a certain batch of
>> data. I
>> > >>> see
>> > >>> > these small files are created under (user/datatorrent) directory
>> and
>> > >>> not
>> > >>> > removed.
>> > >>> >
>> > >>> >         Another scenario, when some of the containers keep
>> failing,
>> > we
>> > >>> > have observed this state where the data is continuously
>> checkpointed
>> > >>> into
>> > >>> > small files. When we kill the app, the data will be there.
>> > >>> >
>> > >>> >         We have received concerns saying this is impacting
>> namenode
>> > >>> > performance since these small files are stored in HDFS. So we
>> > manually
>> > >>> > remove these checkpointed data at regular intervals.
>> > >>> >
>> > >>> > -Venkatesh
>> > >>> >
>> > >>> > -----Original Message-----
>> > >>> > From: Amol Kekre [mailto:[email protected]]
>> > >>> > Sent: Monday, February 01, 2016 7:49 AM
>> > >>> > To: [email protected];
>> [email protected]
>> > >>> > Subject: Re: Possibility of saving checkpoints on other
>> distributed
>> > >>> > filesystems
>> > >>> >
>> > >>> > Aniruddha,
>> > >>> > We have not heard this request from users yet. It may be because
>> our
>> > >>> > checkpointing has a purge, i.e. the small files are not left over.
>> > >>> Small
>> > >>> > file problem has been there in Hadoop and relates to storing small
>> > >>> files in
>> > >>> > Hadoop for a longer time (more likely forever).
>> > >>> >
>> > >>> > Thks,
>> > >>> > Amol
>> > >>> >
>> > >>> >
>> > >>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>> > >>> > [email protected]> wrote:
>> > >>> >
>> > >>> > > Hi Community,
>> > >>> > >
>> > >>> > > Or Let me say BigFoots, do you think this feature should be
>> > >>> available?
>> > >>> > >
>> > >>> > > The reason to bring this up was discussed in the start of this
>> > >>> thread as:
>> > >>> > >
>> > >>> > > This is with the intention to recover the applications faster
>> and
>> > do
>> > >>> > > away
>> > >>> > > > with HDFS's small files problem as described here:
>> > >>> > > >
>> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>> > >>> > > >
>> > >>> > > >
>> > >>> > >
>> > >>>
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > >>> > > l-files-problem/
>> > >>> > > >
>> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > >>> > > > If we could save checkpoints in some other distributed file
>> > system
>> > >>> > > > (or even a HA NAS box) geared for small files, we could
>> achieve -
>> > >>> > > >
>> > >>> > > >    - Better performance of NN & HDFS for the production usage
>> > >>> (read:
>> > >>> > > >    production data I/O & not temp files)
>> > >>> > > >
>> > >>> > > >
>> > >>> > > >    - Faster application recovery in case of planned shutdown /
>> > >>> > unplanned
>> > >>> > > >    restarts
>> > >>> > > >
>> > >>> > > > If you feel the need of this feature, please cast your
>> opinions
>> > and
>> > >>> > > > ideas
>> > >>> > > so that it can be converted in a jira.
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > > Thanks,
>> > >>> > >
>> > >>> > >
>> > >>> > > Aniruddha
>> > >>> > >
>> > >>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>> > >>> > > <[email protected]>
>> > >>> > > wrote:
>> > >>> > >
>> > >>> > > > Aniruddha,
>> > >>> > > >
>> > >>> > > > Currently we don't have any support for that.
>> > >>> > > >
>> > >>> > > > Thanks
>> > >>> > > > Gaurav
>> > >>> > > >
>> > >>> > > > Thanks
>> > >>> > > > -Gaurav
>> > >>> > > >
>> > >>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>> > >>> > > > <[email protected]>
>> > >>> > > > wrote:
>> > >>> > > >
>> > >>> > > > > Default FSStorageAgent can be used as it can work with local
>> > >>> > > filesystem,
>> > >>> > > > > but I far as I know there is no support for specifying the
>> > >>> > > > > directory through xml file. by default it use the
>> application
>> > >>> > directory on HDFS.
>> > >>> > > > >
>> > >>> > > > > Not sure If we could specify storage agent with its
>> properties
>> > >>> > > > > through
>> > >>> > > > the
>> > >>> > > > > configuration at dag level.
>> > >>> > > > >
>> > >>> > > > > - Tushar.
>> > >>> > > > >
>> > >>> > > > >
>> > >>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>> > >>> > > > > [email protected]> wrote:
>> > >>> > > > >
>> > >>> > > > > > Hi,
>> > >>> > > > > >
>> > >>> > > > > > Do we have any storage agent which I can use readily,
>> > >>> > > > > > configurable
>> > >>> > > > > through
>> > >>> > > > > > dt-site.xml?
>> > >>> > > > > >
>> > >>> > > > > > I am looking for something which would save checkpoints in
>> > >>> > > > > > mounted
>> > >>> > > file
>> > >>> > > > > > system [eg. HA-NAS] which is basically just another
>> directory
>> > >>> > > > > > for
>> > >>> > > Apex.
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > > Thanks,
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > > Aniruddha
>> > >>> > > > > >
>> > >>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>> > >>> > > > [email protected]>
>> > >>> > > > > > wrote:
>> > >>> > > > > >
>> > >>> > > > > > > It is already supported refer the following jira for
>> more
>> > >>> > > > information,
>> > >>> > > > > > >
>> > >>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
>> > >>> > > > > > > [email protected]> wrote:
>> > >>> > > > > > >
>> > >>> > > > > > > > Hi,
>> > >>> > > > > > > >
>> > >>> > > > > > > > Is it possible to save checkpoints in any other highly
>> > >>> > > > > > > > available distributed file systems (which maybe
>> mounted
>> > >>> > > > > > > > directories across
>> > >>> > > > the
>> > >>> > > > > > > > cluster) other than HDFS?
>> > >>> > > > > > > > If yes, is it configurable?
>> > >>> > > > > > > >
>> > >>> > > > > > > > AFAIK, there is no configurable option available to
>> > achieve
>> > >>> > that.
>> > >>> > > > > > > > If that's the case, can we have that feature?
>> > >>> > > > > > > >
>> > >>> > > > > > > > This is with the intention to recover the applications
>> > >>> > > > > > > > faster and
>> > >>> > > > do
>> > >>> > > > > > away
>> > >>> > > > > > > > with HDFS's small files problem as described here:
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>> > >>> > > > > > > > m/
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>>
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > >>> > > l-files-problem/
>> > >>> > > > > > > >
>> > >>> > > >
>> > >>> http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > >>> > > > > > > >
>> > >>> > > > > > > > If we could save checkpoints in some other distributed
>> > file
>> > >>> > > system
>> > >>> > > > > (or
>> > >>> > > > > > > even
>> > >>> > > > > > > > a HA NAS box) geared for small files, we could
>> achieve -
>> > >>> > > > > > > >
>> > >>> > > > > > > >    - Better performance of NN & HDFS for the
>> production
>> > >>> > > > > > > > usage
>> > >>> > > > (read:
>> > >>> > > > > > > >    production data I/O & not temp files)
>> > >>> > > > > > > >    - Faster application recovery in case of planned
>> > >>> shutdown
>> > >>> > > > > > > > /
>> > >>> > > > > > unplanned
>> > >>> > > > > > > >    restarts
>> > >>> > > > > > > >
>> > >>> > > > > > > > Please, send your comments, suggestions or ideas.
>> > >>> > > > > > > >
>> > >>> > > > > > > > Thanks,
>> > >>> > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > > > Aniruddha
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>

Re: Possibility of saving checkpoints on other distributed filesystems

Reply via email to