Ram, Are you referring to running your app from the dtcli?
That's one more item to check to not end up with stray files. There is nothing stopping us from using a different default for that use case. My take is that we should structure this in a way where by default unit tests have minimum things to configure and run as fast as reasonably possible. Thomas On Thu, Sep 3, 2015 at 3:19 PM, Munagala Ramanath <[email protected]> wrote: > It's not just unit tests. > > An app developer is likely to run a random app in LM to uncover bugs before > hitting the cluster. > The closer the LM setup is to the cluster setup (i.e. running as much of > the same code as reasonably possible) > the higher the probability that bugs will be hit in LM. > > Ram > > On Thu, Sep 3, 2015 at 3:13 PM, Chandni Singh <[email protected]> > wrote: > > > Yes Chetan, I am claiming that :-) > > > > I still don't understand the need for having two step checkpointing in > > LocalMode by default. > > > > StramLocalCluster should simplify test execution environment as pointed > out > > by Thomas. > > Async checkpoint should have its own test cases using StramLocalCluster > > that should not break when new features are added to the platform. > > But by default I think StramLocalCluster should use synchronous > > checkpointing. > > > > -Chandni > > > > On Thu, Sep 3, 2015 at 2:43 PM, Chetan Narsude <[email protected]> > > wrote: > > > > > Changing the storage agent is one of the ways to address the symptoms > of > > > the problem. But it's not treating the problem. > > > > > > In this case - change the basePath to a location under target and all > the > > > opinions are moot. And someone is claiming that we should not do it. > Not > > > sure why. Or is anyone claiming that anymore? > > > > > > -- > > > Chetan > > > > > > On Thu, Sep 3, 2015 at 1:43 PM, Thomas Weise <[email protected]> > > > wrote: > > > > > >> Good point regarding the coverage. These JUnit tests are supposed to > > test > > >> individual components and all the tests collectively should strive to > > >> achieve high coverage. There are tests in Apex to cover storage > agents, > > >> recovery semantics etc. Components that fall outside of the test > scope > > >> are > > >> reduced as much as possible through mocks (even though there is room > for > > >> improvement). > > >> > > >> The tests in Malhar are for operators and applications, not for the > > >> engine. > > >> In those cases where LM is used, the intention is to test the > > application > > >> functionality. It is expected that certain configurations are adjusted > > for > > >> the test and dependencies mocked. > > >> > > >> For the local mode, it should not be an issue to use a different > storage > > >> agent when it simplifies the test execution. Specifically, in this > case, > > >> we > > >> don't want to go and change many tests to make something work that > isn't > > >> needed. LM is not "production", it is not using HDFS and there are a > > >> number > > >> of other important differences that make it possible to run within the > > >> IDE. > > >> > > >> Instead, focus should be on those things that help with app coverage. > > For > > >> example, in the past we had seen issues with serialization of > operators > > >> that were not uncovered in LM, until we made the serialization part of > > the > > >> execution, even when it was not needed for execution. > > >> > > >> Thomas > > >> > > >> > > >> > > >> > > >> On Thu, Sep 3, 2015 at 11:21 AM, Chetan Narsude < > [email protected] > > > > > >> wrote: > > >> > > >> > I think Ram explained in a little more detail on what I am thinking. > > >> > > > >> > Tests are supposed to provide code coverage. Having localcluster is > > >> already > > >> > a variable, it's not what runs in production. Having a different > > storage > > >> > agent is another variable and it misses out on testing the > > asynchronous > > >> > flow. The gap keeps on increasing if we continue to do that. > > AsyncFSSA > > >> is > > >> > our default because it's supposed to do everything that > FSStorageAgent > > >> does > > >> > and some more. So not clear as to why the test which creates stray > > >> folders > > >> > is not configuring the storage agent properly instead of completely > > >> > changing it out which brings some other problems in as I just > > explained. > > >> > > > >> > If changing the storage agent is the only way to fix the problem > with > > >> > reasonable effort, then I would concede. I highly doubt that. > > >> > > > >> > -- > > >> > Chetan > > >> > > > >> > On Thu, Sep 3, 2015 at 11:05 AM, Chandni Singh < > > [email protected] > > >> > > > >> > wrote: > > >> > > > >> > > The local mode was so far using FSStorageAgent which was used in > > >> > > production. > > >> > > In production using Async is needed because hdfs writes are slow > but > > >> is > > >> > > that the case with LocalMode? > > >> > > > > >> > > In local mode if we use Async we are creating checkpoints under > one > > >> local > > >> > > directory and then copying it to another local directory which > will > > >> not > > >> > > improve any performance. > > >> > > > > >> > > In my opinion StramLocalCluster use synchronous checkpointing as > > >> default. > > >> > > > > >> > > Chandni > > >> > > > > >> > > > > >> > > > > >> > > On Thu, Sep 3, 2015 at 10:09 AM, Chetan Narsude < > > >> [email protected]> > > >> > > wrote: > > >> > > > > >> > >> That sounds a lot like self contradicting reason; Let's make a > > change > > >> > >> because we don't want to make change. :-) > > >> > >> > > >> > >> The code is in certain state. This certain state is consistent > with > > >> how > > >> > >> things run in production. In test environment there is a problem > > that > > >> > stray > > >> > >> files are created. It's a small fix to relocate these files > > >> elsewhere. > > >> > What > > >> > >> I am trying to understand is that is not being done? > > >> > >> > > >> > >> -- > > >> > >> Chetan > > >> > >> > > >> > >> On Thu, Sep 3, 2015 at 9:41 AM, Thomas Weise < > > [email protected] > > >> > > > >> > >> wrote: > > >> > >> > > >> > >>> There is no need to configure anything extra with the proposed > > >> change, > > >> > it > > >> > >>> just brings back LM to how it worked before. > > >> > >>> > > >> > >>> There is no point modifying n tests for extra setup with no > gain. > > >> > >>> > > >> > >>> Thomas > > >> > >>> > > >> > >>> On Thu, Sep 3, 2015 at 9:14 AM, Chetan Narsude < > > >> [email protected] > > >> > > > > >> > >>> wrote: > > >> > >>> > > >> > >>> > Why does it matter that AsyncFSStorageAgent is being used with > > >> > >>> > LocalCluster? It using the localfs and hence no gain is the > > >> > >>> implementation > > >> > >>> > detail that's abstracted out by FileSystem already. > > >> > >>> > > > >> > >>> > If there is a problem of random artifacts left behind after > the > > >> test, > > >> > >>> there > > >> > >>> > is a reason and most likely it's misconfiguration of the > > >> > StorageAgent. > > >> > >>> Why > > >> > >>> > wouldn't that be fixed. > > >> > >>> > > > >> > >>> > -- > > >> > >>> > Chetan > > >> > >>> > > > >> > >>> > > > >> > >>> > On Thu, Sep 3, 2015 at 8:59 AM, Amol Kekre < > > [email protected]> > > >> > >>> wrote: > > >> > >>> > > > >> > >>> > > Clean up container files left over should be a distributed > OS > > >> task. > > >> > >>> Clean > > >> > >>> > > up, back up, archive, ... all is for the OS (aka YARN). We > > must > > >> > >>> assume > > >> > >>> > kill > > >> > >>> > > -9. > > >> > >>> > > > > >> > >>> > > The only thing where the operator comes into play is > > >> "teardown()", > > >> > >>> which > > >> > >>> > is > > >> > >>> > > business logic (not Apex engine) issue. This could be db > > >> connection > > >> > >>> etc. > > >> > >>> > > > > >> > >>> > > Thks, > > >> > >>> > > Amol > > >> > >>> > > > > >> > >>> > > On Thu, Sep 3, 2015 at 8:52 AM, Thomas Weise < > > >> > [email protected] > > >> > >>> > > > >> > >>> > > wrote: > > >> > >>> > > > > >> > >>> > > > When the container gets killed, we should not assume > > anything > > >> > about > > >> > >>> > > > cleanup. It can be a kill -9. Any related "cleanup" falls > > >> under > > >> > >>> nice to > > >> > >>> > > > have, no guarantees. > > >> > >>> > > > > > >> > >>> > > > On Thu, Sep 3, 2015 at 8:49 AM, Chandni Singh < > > >> > >>> [email protected] > > >> > >>> > > > > >> > >>> > > > wrote: > > >> > >>> > > > > > >> > >>> > > > > I have a question regarding what Gaurav mentioned > > >> > >>> > > > > ---- > > >> > >>> > > > > When container runs in cluster, "." specifies the > > containers > > >> > >>> local > > >> > >>> > path > > >> > >>> > > > on > > >> > >>> > > > > the node where container specific jars and other > resources > > >> > >>> resides. > > >> > >>> > It > > >> > >>> > > > > creates a folder under that which is live as long as > > >> container > > >> > >>> lives. > > >> > >>> > > So > > >> > >>> > > > > there are no vagrant folders anywhere > > >> > >>> > > > > --- > > >> > >>> > > > > > > >> > >>> > > > > When the container gets killed, do we cleanup the > folders > > >> > >>> created by > > >> > >>> > > > Async > > >> > >>> > > > > under the containers working dir? > > >> > >>> > > > > > > >> > >>> > > > > On Thu, Sep 3, 2015 at 8:42 AM, Thomas Weise < > > >> > >>> [email protected] > > >> > >>> > > > > >> > >>> > > > > wrote: > > >> > >>> > > > > > > >> > >>> > > > >> It makes sense to use the synchronous checkpointing for > > the > > >> > >>> local > > >> > >>> > > mode. > > >> > >>> > > > >> LM is meant to simplify dependencies and setup. The > > default > > >> > for > > >> > >>> > > > execution > > >> > >>> > > > >> on YARN remains async. > > >> > >>> > > > >> > > >> > >>> > > > >> Thomas > > >> > >>> > > > >> > > >> > >>> > > > >> > > >> > >>> > > > >> On Thu, Sep 3, 2015 at 8:34 AM, Chandni Singh < > > >> > >>> > > [email protected]> > > >> > >>> > > > >> wrote: > > >> > >>> > > > >> > > >> > >>> > > > >>> APPLICATION_PATH isn't related to local base dir of > > Async > > >> as > > >> > >>> far > > >> > >>> > as I > > >> > >>> > > > >>> know. StramLocalCluster sets the APP_PATH to > > "target/...". > > >> > >>> > > > >>> StramLocalCluster should use FSStorageAgent. > > >> > >>> > > > >>> > > >> > >>> > > > >>> - Chandni > > >> > >>> > > > >>> > > >> > >>> > > > >>> On Thu, Sep 3, 2015 at 8:20 AM, Gaurav Gupta < > > >> > >>> > [email protected] > > >> > >>> > > > > > >> > >>> > > > >>> wrote: > > >> > >>> > > > >>> > > >> > >>> > > > >>>> As Thomas mentioned as default remains to be async. > You > > >> can > > >> > >>> either > > >> > >>> > > > >>>> change the storage agent or set the APPLICATION_PATH. > > >> > >>> > > > >>>> > > >> > >>> > > > >>>> When container runs in cluster, "." specifies the > > >> containers > > >> > >>> local > > >> > >>> > > > path > > >> > >>> > > > >>>> on the node where container specific jars and other > > >> > resources > > >> > >>> > > > resides. It > > >> > >>> > > > >>>> creates a folder under that which is live as long as > > >> > container > > >> > >>> > > lives. > > >> > >>> > > > So > > >> > >>> > > > >>>> there are no vagrant folders anywhere > > >> > >>> > > > >>>> > > >> > >>> > > > >>>> Thanks > > >> > >>> > > > >>>> -Gaurav > > >> > >>> > > > >>>> > > >> > >>> > > > >>>> On Wed, Sep 2, 2015 at 11:33 PM, Chandni Singh < > > >> > >>> > > > [email protected] > > >> > >>> > > > >>>> > wrote: > > >> > >>> > > > >>>> > > >> > >>> > > > >>>>> I think there is a problem in the default Async as > > >> well. It > > >> > >>> also > > >> > >>> > > uses > > >> > >>> > > > >>>>> the working directory as its local base path. > > >> > >>> > > > >>>>> > > >> > >>> > > > >>>>> In the Async -> copyToHdfs() method, we delete the > > >> window > > >> > >>> files > > >> > >>> > > but > > >> > >>> > > > >>>>> the folder with the operator name never gets > deleted. > > >> > >>> > > > >>>>> So on the cluster there will be such vagrant > folders > > in > > >> > the > > >> > >>> > > working > > >> > >>> > > > >>>>> directory? > > >> > >>> > > > >>>>> > > >> > >>> > > > >>>>> On Wed, Sep 2, 2015 at 11:17 PM, Thomas Weise < > > >> > >>> > > > [email protected]> > > >> > >>> > > > >>>>> wrote: > > >> > >>> > > > >>>>> > > >> > >>> > > > >>>>>> Chandni, > > >> > >>> > > > >>>>>> > > >> > >>> > > > >>>>>> Agreed. See whether the tests work with the > > synchronous > > >> > >>> storage > > >> > >>> > > > >>>>>> agent. If yes, change them. The default needs to > > remain > > >> > >>> async. > > >> > >>> > > > >>>>>> > > >> > >>> > > > >>>>>> Thomas > > >> > >>> > > > >>>>>> > > >> > >>> > > > >>>>>> > > >> > >>> > > > >>>>>> On Wed, Sep 2, 2015 at 11:05 PM, Chandni Singh < > > >> > >>> > > > >>>>>> [email protected]> wrote: > > >> > >>> > > > >>>>>> > > >> > >>> > > > >>>>>>> Hi, > > >> > >>> > > > >>>>>>> > > >> > >>> > > > >>>>>>> I would like to know what was the reason to use > > >> > >>> > > AsyncFSStorageAgent > > >> > >>> > > > >>>>>>> with StramLocalCluster? > > >> > >>> > > > >>>>>>> StramLocalCluster is mainly for testing in a > > >> > >>> non-distributed > > >> > >>> > mode > > >> > >>> > > > >>>>>>> and I am unclear how AsyncFSStorageAgent is > helpful > > in > > >> > this > > >> > >>> > mode. > > >> > >>> > > > >>>>>>> > > >> > >>> > > > >>>>>>> Thanks, > > >> > >>> > > > >>>>>>> Chandni > > >> > >>> > > > >>>>>>> > > >> > >>> > > > >>>>>>> On Wed, Sep 2, 2015 at 10:45 PM, Chandni Singh < > > >> > >>> > > > >>>>>>> [email protected]> wrote: > > >> > >>> > > > >>>>>>> > > >> > >>> > > > >>>>>>>> This is because of recent changes to > > >> StramLocalCluster > > >> > >>> where > > >> > >>> > > > >>>>>>>> AsyncFSStorageAgent is used for checkpointing > > >> > >>> > > > >>>>>>>> > > >> > >>> > > > >>>>>>>> dag.setAttribute(OperatorContext.STORAGE_AGENT, > new > > >> > >>> > > > AsyncFSStorageAgent(new Path(pathUri, > > >> > >>> > > > LogicalPlan.SUBDIR_CHECKPOINTS).toString(), null)); > > >> > >>> > > > >>>>>>>> > > >> > >>> > > > >>>>>>>> The AsyncFSStorageAgent(String path, > Configuration > > >> conf) > > >> > >>> uses > > >> > >>> > > "." > > >> > >>> > > > as localBasePath and therefore creates sub-directories per > > >> > >>> operator in > > >> > >>> > > the > > >> > >>> > > > current working directory. > > >> > >>> > > > >>>>>>>> > > >> > >>> > > > >>>>>>>> I am going to create a ticket to address this and > > >> will > > >> > >>> fix it. > > >> > >>> > > > >>>>>>>> > > >> > >>> > > > >>>>>>>> -Chandni > > >> > >>> > > > >>>>>>>> > > >> > >>> > > > >>>>>>>> > > >> > >>> > > > >>>>>>>> On Wed, Sep 2, 2015 at 7:13 PM, Chandni Singh < > > >> > >>> > > > >>>>>>>> [email protected]> wrote: > > >> > >>> > > > >>>>>>>> > > >> > >>> > > > >>>>>>>>> Hi, > > >> > >>> > > > >>>>>>>>> > > >> > >>> > > > >>>>>>>>> I can see empty folders getting created under > > >> > Malhar/lib > > >> > >>> > called > > >> > >>> > > > >>>>>>>>> '1' and '2'. > > >> > >>> > > > >>>>>>>>> I think this is because of using LocalMode to > run > > a > > >> > test > > >> > >>> > > > >>>>>>>>> application. > > >> > >>> > > > >>>>>>>>> > > >> > >>> > > > >>>>>>>>> > > >> > >>> > > > >>>>>>>>> If anyone has checked in such cases please do > > check > > >> and > > >> > >>> let > > >> > >>> > us > > >> > >>> > > > >>>>>>>>> know. > > >> > >>> > > > >>>>>>>>> > > >> > >>> > > > >>>>>>>>> Thanks, > > >> > >>> > > > >>>>>>>>> Chandni > > >> > >>> > > > >>>>>>>>> > > >> > >>> > > > >>>>>>>> > > >> > >>> > > > >>>>>>>> > > >> > >>> > > > >>>>>>> > > >> > >>> > > > >>>>>> > > >> > >>> > > > >>>>> > > >> > >>> > > > >>>> > > >> > >>> > > > >>> > > >> > >>> > > > >> > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > >> > >>> > > >> > >> > > >> > >> > > >> > > > > >> > > > >> > > > > > > > > >
