Changing the storage agent is one of the ways to address the symptoms of the problem. But it's not treating the problem.
In this case - change the basePath to a location under target and all the opinions are moot. And someone is claiming that we should not do it. Not sure why. Or is anyone claiming that anymore? -- Chetan On Thu, Sep 3, 2015 at 1:43 PM, Thomas Weise <[email protected]> wrote: > Good point regarding the coverage. These JUnit tests are supposed to test > individual components and all the tests collectively should strive to > achieve high coverage. There are tests in Apex to cover storage agents, > recovery semantics etc. Components that fall outside of the test scope are > reduced as much as possible through mocks (even though there is room for > improvement). > > The tests in Malhar are for operators and applications, not for the engine. > In those cases where LM is used, the intention is to test the application > functionality. It is expected that certain configurations are adjusted for > the test and dependencies mocked. > > For the local mode, it should not be an issue to use a different storage > agent when it simplifies the test execution. Specifically, in this case, we > don't want to go and change many tests to make something work that isn't > needed. LM is not "production", it is not using HDFS and there are a number > of other important differences that make it possible to run within the IDE. > > Instead, focus should be on those things that help with app coverage. For > example, in the past we had seen issues with serialization of operators > that were not uncovered in LM, until we made the serialization part of the > execution, even when it was not needed for execution. > > Thomas > > > > > On Thu, Sep 3, 2015 at 11:21 AM, Chetan Narsude <[email protected]> > wrote: > > > I think Ram explained in a little more detail on what I am thinking. > > > > Tests are supposed to provide code coverage. Having localcluster is > already > > a variable, it's not what runs in production. Having a different storage > > agent is another variable and it misses out on testing the asynchronous > > flow. The gap keeps on increasing if we continue to do that. AsyncFSSA > is > > our default because it's supposed to do everything that FSStorageAgent > does > > and some more. So not clear as to why the test which creates stray > folders > > is not configuring the storage agent properly instead of completely > > changing it out which brings some other problems in as I just explained. > > > > If changing the storage agent is the only way to fix the problem with > > reasonable effort, then I would concede. I highly doubt that. > > > > -- > > Chetan > > > > On Thu, Sep 3, 2015 at 11:05 AM, Chandni Singh <[email protected]> > > wrote: > > > > > The local mode was so far using FSStorageAgent which was used in > > > production. > > > In production using Async is needed because hdfs writes are slow but is > > > that the case with LocalMode? > > > > > > In local mode if we use Async we are creating checkpoints under one > local > > > directory and then copying it to another local directory which will not > > > improve any performance. > > > > > > In my opinion StramLocalCluster use synchronous checkpointing as > default. > > > > > > Chandni > > > > > > > > > > > > On Thu, Sep 3, 2015 at 10:09 AM, Chetan Narsude < > [email protected]> > > > wrote: > > > > > >> That sounds a lot like self contradicting reason; Let's make a change > > >> because we don't want to make change. :-) > > >> > > >> The code is in certain state. This certain state is consistent with > how > > >> things run in production. In test environment there is a problem that > > stray > > >> files are created. It's a small fix to relocate these files elsewhere. > > What > > >> I am trying to understand is that is not being done? > > >> > > >> -- > > >> Chetan > > >> > > >> On Thu, Sep 3, 2015 at 9:41 AM, Thomas Weise <[email protected]> > > >> wrote: > > >> > > >>> There is no need to configure anything extra with the proposed > change, > > it > > >>> just brings back LM to how it worked before. > > >>> > > >>> There is no point modifying n tests for extra setup with no gain. > > >>> > > >>> Thomas > > >>> > > >>> On Thu, Sep 3, 2015 at 9:14 AM, Chetan Narsude < > [email protected] > > > > > >>> wrote: > > >>> > > >>> > Why does it matter that AsyncFSStorageAgent is being used with > > >>> > LocalCluster? It using the localfs and hence no gain is the > > >>> implementation > > >>> > detail that's abstracted out by FileSystem already. > > >>> > > > >>> > If there is a problem of random artifacts left behind after the > test, > > >>> there > > >>> > is a reason and most likely it's misconfiguration of the > > StorageAgent. > > >>> Why > > >>> > wouldn't that be fixed. > > >>> > > > >>> > -- > > >>> > Chetan > > >>> > > > >>> > > > >>> > On Thu, Sep 3, 2015 at 8:59 AM, Amol Kekre <[email protected]> > > >>> wrote: > > >>> > > > >>> > > Clean up container files left over should be a distributed OS > task. > > >>> Clean > > >>> > > up, back up, archive, ... all is for the OS (aka YARN). We must > > >>> assume > > >>> > kill > > >>> > > -9. > > >>> > > > > >>> > > The only thing where the operator comes into play is > "teardown()", > > >>> which > > >>> > is > > >>> > > business logic (not Apex engine) issue. This could be db > connection > > >>> etc. > > >>> > > > > >>> > > Thks, > > >>> > > Amol > > >>> > > > > >>> > > On Thu, Sep 3, 2015 at 8:52 AM, Thomas Weise < > > [email protected] > > >>> > > > >>> > > wrote: > > >>> > > > > >>> > > > When the container gets killed, we should not assume anything > > about > > >>> > > > cleanup. It can be a kill -9. Any related "cleanup" falls under > > >>> nice to > > >>> > > > have, no guarantees. > > >>> > > > > > >>> > > > On Thu, Sep 3, 2015 at 8:49 AM, Chandni Singh < > > >>> [email protected] > > >>> > > > > >>> > > > wrote: > > >>> > > > > > >>> > > > > I have a question regarding what Gaurav mentioned > > >>> > > > > ---- > > >>> > > > > When container runs in cluster, "." specifies the containers > > >>> local > > >>> > path > > >>> > > > on > > >>> > > > > the node where container specific jars and other resources > > >>> resides. > > >>> > It > > >>> > > > > creates a folder under that which is live as long as > container > > >>> lives. > > >>> > > So > > >>> > > > > there are no vagrant folders anywhere > > >>> > > > > --- > > >>> > > > > > > >>> > > > > When the container gets killed, do we cleanup the folders > > >>> created by > > >>> > > > Async > > >>> > > > > under the containers working dir? > > >>> > > > > > > >>> > > > > On Thu, Sep 3, 2015 at 8:42 AM, Thomas Weise < > > >>> [email protected] > > >>> > > > > >>> > > > > wrote: > > >>> > > > > > > >>> > > > >> It makes sense to use the synchronous checkpointing for the > > >>> local > > >>> > > mode. > > >>> > > > >> LM is meant to simplify dependencies and setup. The default > > for > > >>> > > > execution > > >>> > > > >> on YARN remains async. > > >>> > > > >> > > >>> > > > >> Thomas > > >>> > > > >> > > >>> > > > >> > > >>> > > > >> On Thu, Sep 3, 2015 at 8:34 AM, Chandni Singh < > > >>> > > [email protected]> > > >>> > > > >> wrote: > > >>> > > > >> > > >>> > > > >>> APPLICATION_PATH isn't related to local base dir of Async > as > > >>> far > > >>> > as I > > >>> > > > >>> know. StramLocalCluster sets the APP_PATH to "target/...". > > >>> > > > >>> StramLocalCluster should use FSStorageAgent. > > >>> > > > >>> > > >>> > > > >>> - Chandni > > >>> > > > >>> > > >>> > > > >>> On Thu, Sep 3, 2015 at 8:20 AM, Gaurav Gupta < > > >>> > [email protected] > > >>> > > > > > >>> > > > >>> wrote: > > >>> > > > >>> > > >>> > > > >>>> As Thomas mentioned as default remains to be async. You > can > > >>> either > > >>> > > > >>>> change the storage agent or set the APPLICATION_PATH. > > >>> > > > >>>> > > >>> > > > >>>> When container runs in cluster, "." specifies the > containers > > >>> local > > >>> > > > path > > >>> > > > >>>> on the node where container specific jars and other > > resources > > >>> > > > resides. It > > >>> > > > >>>> creates a folder under that which is live as long as > > container > > >>> > > lives. > > >>> > > > So > > >>> > > > >>>> there are no vagrant folders anywhere > > >>> > > > >>>> > > >>> > > > >>>> Thanks > > >>> > > > >>>> -Gaurav > > >>> > > > >>>> > > >>> > > > >>>> On Wed, Sep 2, 2015 at 11:33 PM, Chandni Singh < > > >>> > > > [email protected] > > >>> > > > >>>> > wrote: > > >>> > > > >>>> > > >>> > > > >>>>> I think there is a problem in the default Async as well. > It > > >>> also > > >>> > > uses > > >>> > > > >>>>> the working directory as its local base path. > > >>> > > > >>>>> > > >>> > > > >>>>> In the Async -> copyToHdfs() method, we delete the > window > > >>> files > > >>> > > but > > >>> > > > >>>>> the folder with the operator name never gets deleted. > > >>> > > > >>>>> So on the cluster there will be such vagrant folders in > > the > > >>> > > working > > >>> > > > >>>>> directory? > > >>> > > > >>>>> > > >>> > > > >>>>> On Wed, Sep 2, 2015 at 11:17 PM, Thomas Weise < > > >>> > > > [email protected]> > > >>> > > > >>>>> wrote: > > >>> > > > >>>>> > > >>> > > > >>>>>> Chandni, > > >>> > > > >>>>>> > > >>> > > > >>>>>> Agreed. See whether the tests work with the synchronous > > >>> storage > > >>> > > > >>>>>> agent. If yes, change them. The default needs to remain > > >>> async. > > >>> > > > >>>>>> > > >>> > > > >>>>>> Thomas > > >>> > > > >>>>>> > > >>> > > > >>>>>> > > >>> > > > >>>>>> On Wed, Sep 2, 2015 at 11:05 PM, Chandni Singh < > > >>> > > > >>>>>> [email protected]> wrote: > > >>> > > > >>>>>> > > >>> > > > >>>>>>> Hi, > > >>> > > > >>>>>>> > > >>> > > > >>>>>>> I would like to know what was the reason to use > > >>> > > AsyncFSStorageAgent > > >>> > > > >>>>>>> with StramLocalCluster? > > >>> > > > >>>>>>> StramLocalCluster is mainly for testing in a > > >>> non-distributed > > >>> > mode > > >>> > > > >>>>>>> and I am unclear how AsyncFSStorageAgent is helpful in > > this > > >>> > mode. > > >>> > > > >>>>>>> > > >>> > > > >>>>>>> Thanks, > > >>> > > > >>>>>>> Chandni > > >>> > > > >>>>>>> > > >>> > > > >>>>>>> On Wed, Sep 2, 2015 at 10:45 PM, Chandni Singh < > > >>> > > > >>>>>>> [email protected]> wrote: > > >>> > > > >>>>>>> > > >>> > > > >>>>>>>> This is because of recent changes to StramLocalCluster > > >>> where > > >>> > > > >>>>>>>> AsyncFSStorageAgent is used for checkpointing > > >>> > > > >>>>>>>> > > >>> > > > >>>>>>>> dag.setAttribute(OperatorContext.STORAGE_AGENT, new > > >>> > > > AsyncFSStorageAgent(new Path(pathUri, > > >>> > > > LogicalPlan.SUBDIR_CHECKPOINTS).toString(), null)); > > >>> > > > >>>>>>>> > > >>> > > > >>>>>>>> The AsyncFSStorageAgent(String path, Configuration > conf) > > >>> uses > > >>> > > "." > > >>> > > > as localBasePath and therefore creates sub-directories per > > >>> operator in > > >>> > > the > > >>> > > > current working directory. > > >>> > > > >>>>>>>> > > >>> > > > >>>>>>>> I am going to create a ticket to address this and will > > >>> fix it. > > >>> > > > >>>>>>>> > > >>> > > > >>>>>>>> -Chandni > > >>> > > > >>>>>>>> > > >>> > > > >>>>>>>> > > >>> > > > >>>>>>>> On Wed, Sep 2, 2015 at 7:13 PM, Chandni Singh < > > >>> > > > >>>>>>>> [email protected]> wrote: > > >>> > > > >>>>>>>> > > >>> > > > >>>>>>>>> Hi, > > >>> > > > >>>>>>>>> > > >>> > > > >>>>>>>>> I can see empty folders getting created under > > Malhar/lib > > >>> > called > > >>> > > > >>>>>>>>> '1' and '2'. > > >>> > > > >>>>>>>>> I think this is because of using LocalMode to run a > > test > > >>> > > > >>>>>>>>> application. > > >>> > > > >>>>>>>>> > > >>> > > > >>>>>>>>> > > >>> > > > >>>>>>>>> If anyone has checked in such cases please do check > and > > >>> let > > >>> > us > > >>> > > > >>>>>>>>> know. > > >>> > > > >>>>>>>>> > > >>> > > > >>>>>>>>> Thanks, > > >>> > > > >>>>>>>>> Chandni > > >>> > > > >>>>>>>>> > > >>> > > > >>>>>>>> > > >>> > > > >>>>>>>> > > >>> > > > >>>>>>> > > >>> > > > >>>>>> > > >>> > > > >>>>> > > >>> > > > >>>> > > >>> > > > >>> > > >>> > > > >> > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >> > > >> > > > > > >
