Ram,

Are you referring to running your app from the dtcli?

That's one more item to check to not end up with stray files. There is
nothing stopping us from using a different default for that use case. My
take is that we should structure this in a way where by default unit tests
have minimum things to configure and run as fast as reasonably possible.

Thomas

On Thu, Sep 3, 2015 at 3:19 PM, Munagala Ramanath <[email protected]>
wrote:

> It's not just unit tests.
>
> An app developer is likely to run a random app in LM to uncover bugs before
> hitting the cluster.
> The closer the LM setup is to the cluster setup (i.e. running as much of
> the same code as reasonably possible)
> the higher the probability that bugs will be hit in LM.
>
> Ram
>
> On Thu, Sep 3, 2015 at 3:13 PM, Chandni Singh <[email protected]>
> wrote:
>
> > Yes Chetan, I am claiming that :-)
> >
> > I still don't understand the need for having two step checkpointing in
> > LocalMode by default.
> >
> > StramLocalCluster should simplify test execution environment as pointed
> out
> > by Thomas.
> > Async checkpoint should have its own test cases using StramLocalCluster
> > that should not break when new features are added to the platform.
> > But by default I think StramLocalCluster should use synchronous
> > checkpointing.
> >
> > -Chandni
> >
> > On Thu, Sep 3, 2015 at 2:43 PM, Chetan Narsude <[email protected]>
> > wrote:
> >
> > > Changing the storage agent is one of the ways to address the symptoms
> of
> > > the problem. But it's not treating the problem.
> > >
> > > In this case - change the basePath to a location under target and all
> the
> > > opinions are moot. And someone is claiming that we should not do it.
> Not
> > > sure why. Or is anyone claiming that anymore?
> > >
> > > --
> > > Chetan
> > >
> > > On Thu, Sep 3, 2015 at 1:43 PM, Thomas Weise <[email protected]>
> > > wrote:
> > >
> > >> Good point regarding the coverage. These JUnit tests are supposed to
> > test
> > >> individual components and all the tests collectively should strive to
> > >> achieve high coverage. There are tests in Apex to cover storage
> agents,
> > >> recovery semantics etc.  Components that fall outside of the test
> scope
> > >> are
> > >> reduced as much as possible through mocks (even though there is room
> for
> > >> improvement).
> > >>
> > >> The tests in Malhar are for operators and applications, not for the
> > >> engine.
> > >> In those cases where LM is used, the intention is to test the
> > application
> > >> functionality. It is expected that certain configurations are adjusted
> > for
> > >> the test and dependencies mocked.
> > >>
> > >> For the local mode, it should not be an issue to use a different
> storage
> > >> agent when it simplifies the test execution. Specifically, in this
> case,
> > >> we
> > >> don't want to go and change many tests to make something work that
> isn't
> > >> needed. LM is not "production", it is not using HDFS and there are a
> > >> number
> > >> of other important differences that make it possible to run within the
> > >> IDE.
> > >>
> > >> Instead, focus should be on those things that help with app coverage.
> > For
> > >> example, in the past we had seen issues with serialization of
> operators
> > >> that were not uncovered in LM, until we made the serialization part of
> > the
> > >> execution, even when it was not needed for execution.
> > >>
> > >> Thomas
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, Sep 3, 2015 at 11:21 AM, Chetan Narsude <
> [email protected]
> > >
> > >> wrote:
> > >>
> > >> > I think Ram explained in a little more detail on what I am thinking.
> > >> >
> > >> > Tests are supposed to provide code coverage. Having localcluster is
> > >> already
> > >> > a variable, it's not what runs in production. Having a different
> > storage
> > >> > agent is another variable and it misses out on testing the
> > asynchronous
> > >> > flow. The gap keeps on increasing if we continue to do that.
> > AsyncFSSA
> > >> is
> > >> > our default because it's supposed to do everything that
> FSStorageAgent
> > >> does
> > >> > and some more. So not clear as to why the test which creates stray
> > >> folders
> > >> > is not configuring the storage agent properly instead of completely
> > >> > changing it out which brings some other problems in as I just
> > explained.
> > >> >
> > >> > If changing the storage agent is the only way to fix the problem
> with
> > >> > reasonable effort, then I would concede. I highly doubt that.
> > >> >
> > >> > --
> > >> > Chetan
> > >> >
> > >> > On Thu, Sep 3, 2015 at 11:05 AM, Chandni Singh <
> > [email protected]
> > >> >
> > >> > wrote:
> > >> >
> > >> > > The local mode was so far using FSStorageAgent which was used in
> > >> > > production.
> > >> > > In production using Async is needed because hdfs writes are slow
> but
> > >> is
> > >> > > that the case with LocalMode?
> > >> > >
> > >> > > In local mode if we use Async we are creating checkpoints under
> one
> > >> local
> > >> > > directory and then copying it to another local directory which
> will
> > >> not
> > >> > > improve any performance.
> > >> > >
> > >> > > In my opinion StramLocalCluster use synchronous checkpointing as
> > >> default.
> > >> > >
> > >> > > Chandni
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Sep 3, 2015 at 10:09 AM, Chetan Narsude <
> > >> [email protected]>
> > >> > > wrote:
> > >> > >
> > >> > >> That sounds a lot like self contradicting reason; Let's make a
> > change
> > >> > >> because we don't want to make change. :-)
> > >> > >>
> > >> > >> The code is in certain state. This certain state is consistent
> with
> > >> how
> > >> > >> things run in production. In test environment there is a problem
> > that
> > >> > stray
> > >> > >> files are created. It's a small fix to relocate these files
> > >> elsewhere.
> > >> > What
> > >> > >> I am trying to understand is that is not being done?
> > >> > >>
> > >> > >> --
> > >> > >> Chetan
> > >> > >>
> > >> > >> On Thu, Sep 3, 2015 at 9:41 AM, Thomas Weise <
> > [email protected]
> > >> >
> > >> > >> wrote:
> > >> > >>
> > >> > >>> There is no need to configure anything extra with the proposed
> > >> change,
> > >> > it
> > >> > >>> just brings back LM to how it worked before.
> > >> > >>>
> > >> > >>> There is no point modifying n tests for extra setup with no
> gain.
> > >> > >>>
> > >> > >>> Thomas
> > >> > >>>
> > >> > >>> On Thu, Sep 3, 2015 at 9:14 AM, Chetan Narsude <
> > >> [email protected]
> > >> > >
> > >> > >>> wrote:
> > >> > >>>
> > >> > >>> > Why does it matter that AsyncFSStorageAgent is being used with
> > >> > >>> > LocalCluster? It using the localfs and hence no gain is the
> > >> > >>> implementation
> > >> > >>> > detail that's abstracted out by FileSystem already.
> > >> > >>> >
> > >> > >>> > If there is a problem of random artifacts left behind after
> the
> > >> test,
> > >> > >>> there
> > >> > >>> > is a reason and most likely it's misconfiguration of the
> > >> > StorageAgent.
> > >> > >>> Why
> > >> > >>> > wouldn't that be fixed.
> > >> > >>> >
> > >> > >>> > --
> > >> > >>> > Chetan
> > >> > >>> >
> > >> > >>> >
> > >> > >>> > On Thu, Sep 3, 2015 at 8:59 AM, Amol Kekre <
> > [email protected]>
> > >> > >>> wrote:
> > >> > >>> >
> > >> > >>> > > Clean up container files left over should be a distributed
> OS
> > >> task.
> > >> > >>> Clean
> > >> > >>> > > up, back up, archive, ... all is for the OS (aka YARN). We
> > must
> > >> > >>> assume
> > >> > >>> > kill
> > >> > >>> > > -9.
> > >> > >>> > >
> > >> > >>> > > The only thing where the operator comes into play is
> > >> "teardown()",
> > >> > >>> which
> > >> > >>> > is
> > >> > >>> > > business logic (not Apex engine) issue. This could be db
> > >> connection
> > >> > >>> etc.
> > >> > >>> > >
> > >> > >>> > > Thks,
> > >> > >>> > > Amol
> > >> > >>> > >
> > >> > >>> > > On Thu, Sep 3, 2015 at 8:52 AM, Thomas Weise <
> > >> > [email protected]
> > >> > >>> >
> > >> > >>> > > wrote:
> > >> > >>> > >
> > >> > >>> > > > When the container gets killed, we should not assume
> > anything
> > >> > about
> > >> > >>> > > > cleanup. It can be a kill -9. Any related "cleanup" falls
> > >> under
> > >> > >>> nice to
> > >> > >>> > > > have, no guarantees.
> > >> > >>> > > >
> > >> > >>> > > > On Thu, Sep 3, 2015 at 8:49 AM, Chandni Singh <
> > >> > >>> [email protected]
> > >> > >>> > >
> > >> > >>> > > > wrote:
> > >> > >>> > > >
> > >> > >>> > > > > I have a question regarding what Gaurav mentioned
> > >> > >>> > > > > ----
> > >> > >>> > > > > When container runs in cluster, "." specifies the
> > containers
> > >> > >>> local
> > >> > >>> > path
> > >> > >>> > > > on
> > >> > >>> > > > > the node where container specific jars and other
> resources
> > >> > >>> resides.
> > >> > >>> > It
> > >> > >>> > > > > creates a folder under that which is live as long as
> > >> container
> > >> > >>> lives.
> > >> > >>> > > So
> > >> > >>> > > > > there are no vagrant folders anywhere
> > >> > >>> > > > > ---
> > >> > >>> > > > >
> > >> > >>> > > > > When the container gets killed, do we cleanup the
> folders
> > >> > >>> created by
> > >> > >>> > > > Async
> > >> > >>> > > > > under the containers working dir?
> > >> > >>> > > > >
> > >> > >>> > > > > On Thu, Sep 3, 2015 at 8:42 AM, Thomas Weise <
> > >> > >>> [email protected]
> > >> > >>> > >
> > >> > >>> > > > > wrote:
> > >> > >>> > > > >
> > >> > >>> > > > >> It makes sense to use the synchronous checkpointing for
> > the
> > >> > >>> local
> > >> > >>> > > mode.
> > >> > >>> > > > >> LM is meant to simplify dependencies and setup. The
> > default
> > >> > for
> > >> > >>> > > > execution
> > >> > >>> > > > >> on YARN remains async.
> > >> > >>> > > > >>
> > >> > >>> > > > >> Thomas
> > >> > >>> > > > >>
> > >> > >>> > > > >>
> > >> > >>> > > > >> On Thu, Sep 3, 2015 at 8:34 AM, Chandni Singh <
> > >> > >>> > > [email protected]>
> > >> > >>> > > > >> wrote:
> > >> > >>> > > > >>
> > >> > >>> > > > >>> APPLICATION_PATH isn't related to local base dir of
> > Async
> > >> as
> > >> > >>> far
> > >> > >>> > as I
> > >> > >>> > > > >>> know. StramLocalCluster sets the APP_PATH to
> > "target/...".
> > >> > >>> > > > >>> StramLocalCluster should use FSStorageAgent.
> > >> > >>> > > > >>>
> > >> > >>> > > > >>> - Chandni
> > >> > >>> > > > >>>
> > >> > >>> > > > >>> On Thu, Sep 3, 2015 at 8:20 AM, Gaurav Gupta <
> > >> > >>> > [email protected]
> > >> > >>> > > >
> > >> > >>> > > > >>> wrote:
> > >> > >>> > > > >>>
> > >> > >>> > > > >>>> As Thomas mentioned as default remains to be async.
> You
> > >> can
> > >> > >>> either
> > >> > >>> > > > >>>> change the storage agent or set the APPLICATION_PATH.
> > >> > >>> > > > >>>>
> > >> > >>> > > > >>>> When container runs in cluster, "." specifies the
> > >> containers
> > >> > >>> local
> > >> > >>> > > > path
> > >> > >>> > > > >>>> on the node where container specific jars and other
> > >> > resources
> > >> > >>> > > > resides. It
> > >> > >>> > > > >>>> creates a folder under that which is live as long as
> > >> > container
> > >> > >>> > > lives.
> > >> > >>> > > > So
> > >> > >>> > > > >>>> there are no vagrant folders anywhere
> > >> > >>> > > > >>>>
> > >> > >>> > > > >>>> Thanks
> > >> > >>> > > > >>>> -Gaurav
> > >> > >>> > > > >>>>
> > >> > >>> > > > >>>> On Wed, Sep 2, 2015 at 11:33 PM, Chandni Singh <
> > >> > >>> > > > [email protected]
> > >> > >>> > > > >>>> > wrote:
> > >> > >>> > > > >>>>
> > >> > >>> > > > >>>>> I think there is a problem in the default Async as
> > >> well. It
> > >> > >>> also
> > >> > >>> > > uses
> > >> > >>> > > > >>>>> the working directory as its local base path.
> > >> > >>> > > > >>>>>
> > >> > >>> > > > >>>>> In the Async -> copyToHdfs()  method, we delete the
> > >> window
> > >> > >>> files
> > >> > >>> > > but
> > >> > >>> > > > >>>>> the folder with the operator name never gets
> deleted.
> > >> > >>> > > > >>>>> So on the cluster there  will be such vagrant
> folders
> > in
> > >> > the
> > >> > >>> > > working
> > >> > >>> > > > >>>>> directory?
> > >> > >>> > > > >>>>>
> > >> > >>> > > > >>>>> On Wed, Sep 2, 2015 at 11:17 PM, Thomas Weise <
> > >> > >>> > > > [email protected]>
> > >> > >>> > > > >>>>> wrote:
> > >> > >>> > > > >>>>>
> > >> > >>> > > > >>>>>> Chandni,
> > >> > >>> > > > >>>>>>
> > >> > >>> > > > >>>>>> Agreed. See whether the tests work with the
> > synchronous
> > >> > >>> storage
> > >> > >>> > > > >>>>>> agent. If yes, change them. The default needs to
> > remain
> > >> > >>> async.
> > >> > >>> > > > >>>>>>
> > >> > >>> > > > >>>>>> Thomas
> > >> > >>> > > > >>>>>>
> > >> > >>> > > > >>>>>>
> > >> > >>> > > > >>>>>> On Wed, Sep 2, 2015 at 11:05 PM, Chandni Singh <
> > >> > >>> > > > >>>>>> [email protected]> wrote:
> > >> > >>> > > > >>>>>>
> > >> > >>> > > > >>>>>>> Hi,
> > >> > >>> > > > >>>>>>>
> > >> > >>> > > > >>>>>>> I would like to know what was the reason to use
> > >> > >>> > > AsyncFSStorageAgent
> > >> > >>> > > > >>>>>>> with StramLocalCluster?
> > >> > >>> > > > >>>>>>> StramLocalCluster is mainly for testing in a
> > >> > >>> non-distributed
> > >> > >>> > mode
> > >> > >>> > > > >>>>>>> and I am unclear how AsyncFSStorageAgent is
> helpful
> > in
> > >> > this
> > >> > >>> > mode.
> > >> > >>> > > > >>>>>>>
> > >> > >>> > > > >>>>>>> Thanks,
> > >> > >>> > > > >>>>>>> Chandni
> > >> > >>> > > > >>>>>>>
> > >> > >>> > > > >>>>>>> On Wed, Sep 2, 2015 at 10:45 PM, Chandni Singh <
> > >> > >>> > > > >>>>>>> [email protected]> wrote:
> > >> > >>> > > > >>>>>>>
> > >> > >>> > > > >>>>>>>> This is because of recent changes to
> > >> StramLocalCluster
> > >> > >>> where
> > >> > >>> > > > >>>>>>>> AsyncFSStorageAgent is used for checkpointing
> > >> > >>> > > > >>>>>>>>
> > >> > >>> > > > >>>>>>>> dag.setAttribute(OperatorContext.STORAGE_AGENT,
> new
> > >> > >>> > > > AsyncFSStorageAgent(new Path(pathUri,
> > >> > >>> > > > LogicalPlan.SUBDIR_CHECKPOINTS).toString(), null));
> > >> > >>> > > > >>>>>>>>
> > >> > >>> > > > >>>>>>>> The AsyncFSStorageAgent(String path,
> Configuration
> > >> conf)
> > >> > >>> uses
> > >> > >>> > > "."
> > >> > >>> > > > as localBasePath and therefore creates sub-directories per
> > >> > >>> operator in
> > >> > >>> > > the
> > >> > >>> > > > current working directory.
> > >> > >>> > > > >>>>>>>>
> > >> > >>> > > > >>>>>>>> I am going to create a ticket to address this and
> > >> will
> > >> > >>> fix it.
> > >> > >>> > > > >>>>>>>>
> > >> > >>> > > > >>>>>>>> -Chandni
> > >> > >>> > > > >>>>>>>>
> > >> > >>> > > > >>>>>>>>
> > >> > >>> > > > >>>>>>>> On Wed, Sep 2, 2015 at 7:13 PM, Chandni Singh <
> > >> > >>> > > > >>>>>>>> [email protected]> wrote:
> > >> > >>> > > > >>>>>>>>
> > >> > >>> > > > >>>>>>>>> Hi,
> > >> > >>> > > > >>>>>>>>>
> > >> > >>> > > > >>>>>>>>> I can see empty folders getting created under
> > >> > Malhar/lib
> > >> > >>> > called
> > >> > >>> > > > >>>>>>>>> '1' and '2'.
> > >> > >>> > > > >>>>>>>>> I think this is because of using LocalMode to
> run
> > a
> > >> > test
> > >> > >>> > > > >>>>>>>>> application.
> > >> > >>> > > > >>>>>>>>>
> > >> > >>> > > > >>>>>>>>>
> > >> > >>> > > > >>>>>>>>> If anyone has checked in such cases please do
> > check
> > >> and
> > >> > >>> let
> > >> > >>> > us
> > >> > >>> > > > >>>>>>>>> know.
> > >> > >>> > > > >>>>>>>>>
> > >> > >>> > > > >>>>>>>>> Thanks,
> > >> > >>> > > > >>>>>>>>> Chandni
> > >> > >>> > > > >>>>>>>>>
> > >> > >>> > > > >>>>>>>>
> > >> > >>> > > > >>>>>>>>
> > >> > >>> > > > >>>>>>>
> > >> > >>> > > > >>>>>>
> > >> > >>> > > > >>>>>
> > >> > >>> > > > >>>>
> > >> > >>> > > > >>>
> > >> > >>> > > > >>
> > >> > >>> > > > >
> > >> > >>> > > >
> > >> > >>> > >
> > >> > >>> >
> > >> > >>>
> > >> > >>
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to