Just changing the subject here. -- Chetan
On Thu, Sep 3, 2015 at 3:35 PM, Thomas Weise <[email protected]> wrote: > Ram, > > Are you referring to running your app from the dtcli? > > That's one more item to check to not end up with stray files. There is > nothing stopping us from using a different default for that use case. My > take is that we should structure this in a way where by default unit tests > have minimum things to configure and run as fast as reasonably possible. > > Thomas > > On Thu, Sep 3, 2015 at 3:19 PM, Munagala Ramanath <[email protected]> > wrote: > > > It's not just unit tests. > > > > An app developer is likely to run a random app in LM to uncover bugs > before > > hitting the cluster. > > The closer the LM setup is to the cluster setup (i.e. running as much of > > the same code as reasonably possible) > > the higher the probability that bugs will be hit in LM. > > > > Ram > > > > On Thu, Sep 3, 2015 at 3:13 PM, Chandni Singh <[email protected]> > > wrote: > > > > > Yes Chetan, I am claiming that :-) > > > > > > I still don't understand the need for having two step checkpointing in > > > LocalMode by default. > > > > > > StramLocalCluster should simplify test execution environment as pointed > > out > > > by Thomas. > > > Async checkpoint should have its own test cases using StramLocalCluster > > > that should not break when new features are added to the platform. > > > But by default I think StramLocalCluster should use synchronous > > > checkpointing. > > > > > > -Chandni > > > > > > On Thu, Sep 3, 2015 at 2:43 PM, Chetan Narsude <[email protected] > > > > > wrote: > > > > > > > Changing the storage agent is one of the ways to address the symptoms > > of > > > > the problem. But it's not treating the problem. > > > > > > > > In this case - change the basePath to a location under target and all > > the > > > > opinions are moot. And someone is claiming that we should not do it. > > Not > > > > sure why. Or is anyone claiming that anymore? > > > > > > > > -- > > > > Chetan > > > > > > > > On Thu, Sep 3, 2015 at 1:43 PM, Thomas Weise <[email protected] > > > > > > wrote: > > > > > > > >> Good point regarding the coverage. These JUnit tests are supposed to > > > test > > > >> individual components and all the tests collectively should strive > to > > > >> achieve high coverage. There are tests in Apex to cover storage > > agents, > > > >> recovery semantics etc. Components that fall outside of the test > > scope > > > >> are > > > >> reduced as much as possible through mocks (even though there is room > > for > > > >> improvement). > > > >> > > > >> The tests in Malhar are for operators and applications, not for the > > > >> engine. > > > >> In those cases where LM is used, the intention is to test the > > > application > > > >> functionality. It is expected that certain configurations are > adjusted > > > for > > > >> the test and dependencies mocked. > > > >> > > > >> For the local mode, it should not be an issue to use a different > > storage > > > >> agent when it simplifies the test execution. Specifically, in this > > case, > > > >> we > > > >> don't want to go and change many tests to make something work that > > isn't > > > >> needed. LM is not "production", it is not using HDFS and there are a > > > >> number > > > >> of other important differences that make it possible to run within > the > > > >> IDE. > > > >> > > > >> Instead, focus should be on those things that help with app > coverage. > > > For > > > >> example, in the past we had seen issues with serialization of > > operators > > > >> that were not uncovered in LM, until we made the serialization part > of > > > the > > > >> execution, even when it was not needed for execution. > > > >> > > > >> Thomas > > > >> > > > >> > > > >> > > > >> > > > >> On Thu, Sep 3, 2015 at 11:21 AM, Chetan Narsude < > > [email protected] > > > > > > > >> wrote: > > > >> > > > >> > I think Ram explained in a little more detail on what I am > thinking. > > > >> > > > > >> > Tests are supposed to provide code coverage. Having localcluster > is > > > >> already > > > >> > a variable, it's not what runs in production. Having a different > > > storage > > > >> > agent is another variable and it misses out on testing the > > > asynchronous > > > >> > flow. The gap keeps on increasing if we continue to do that. > > > AsyncFSSA > > > >> is > > > >> > our default because it's supposed to do everything that > > FSStorageAgent > > > >> does > > > >> > and some more. So not clear as to why the test which creates stray > > > >> folders > > > >> > is not configuring the storage agent properly instead of > completely > > > >> > changing it out which brings some other problems in as I just > > > explained. > > > >> > > > > >> > If changing the storage agent is the only way to fix the problem > > with > > > >> > reasonable effort, then I would concede. I highly doubt that. > > > >> > > > > >> > -- > > > >> > Chetan > > > >> > > > > >> > On Thu, Sep 3, 2015 at 11:05 AM, Chandni Singh < > > > [email protected] > > > >> > > > > >> > wrote: > > > >> > > > > >> > > The local mode was so far using FSStorageAgent which was used in > > > >> > > production. > > > >> > > In production using Async is needed because hdfs writes are slow > > but > > > >> is > > > >> > > that the case with LocalMode? > > > >> > > > > > >> > > In local mode if we use Async we are creating checkpoints under > > one > > > >> local > > > >> > > directory and then copying it to another local directory which > > will > > > >> not > > > >> > > improve any performance. > > > >> > > > > > >> > > In my opinion StramLocalCluster use synchronous checkpointing as > > > >> default. > > > >> > > > > > >> > > Chandni > > > >> > > > > > >> > > > > > >> > > > > > >> > > On Thu, Sep 3, 2015 at 10:09 AM, Chetan Narsude < > > > >> [email protected]> > > > >> > > wrote: > > > >> > > > > > >> > >> That sounds a lot like self contradicting reason; Let's make a > > > change > > > >> > >> because we don't want to make change. :-) > > > >> > >> > > > >> > >> The code is in certain state. This certain state is consistent > > with > > > >> how > > > >> > >> things run in production. In test environment there is a > problem > > > that > > > >> > stray > > > >> > >> files are created. It's a small fix to relocate these files > > > >> elsewhere. > > > >> > What > > > >> > >> I am trying to understand is that is not being done? > > > >> > >> > > > >> > >> -- > > > >> > >> Chetan > > > >> > >> > > > >> > >> On Thu, Sep 3, 2015 at 9:41 AM, Thomas Weise < > > > [email protected] > > > >> > > > > >> > >> wrote: > > > >> > >> > > > >> > >>> There is no need to configure anything extra with the proposed > > > >> change, > > > >> > it > > > >> > >>> just brings back LM to how it worked before. > > > >> > >>> > > > >> > >>> There is no point modifying n tests for extra setup with no > > gain. > > > >> > >>> > > > >> > >>> Thomas > > > >> > >>> > > > >> > >>> On Thu, Sep 3, 2015 at 9:14 AM, Chetan Narsude < > > > >> [email protected] > > > >> > > > > > >> > >>> wrote: > > > >> > >>> > > > >> > >>> > Why does it matter that AsyncFSStorageAgent is being used > with > > > >> > >>> > LocalCluster? It using the localfs and hence no gain is the > > > >> > >>> implementation > > > >> > >>> > detail that's abstracted out by FileSystem already. > > > >> > >>> > > > > >> > >>> > If there is a problem of random artifacts left behind after > > the > > > >> test, > > > >> > >>> there > > > >> > >>> > is a reason and most likely it's misconfiguration of the > > > >> > StorageAgent. > > > >> > >>> Why > > > >> > >>> > wouldn't that be fixed. > > > >> > >>> > > > > >> > >>> > -- > > > >> > >>> > Chetan > > > >> > >>> > > > > >> > >>> > > > > >> > >>> > On Thu, Sep 3, 2015 at 8:59 AM, Amol Kekre < > > > [email protected]> > > > >> > >>> wrote: > > > >> > >>> > > > > >> > >>> > > Clean up container files left over should be a distributed > > OS > > > >> task. > > > >> > >>> Clean > > > >> > >>> > > up, back up, archive, ... all is for the OS (aka YARN). We > > > must > > > >> > >>> assume > > > >> > >>> > kill > > > >> > >>> > > -9. > > > >> > >>> > > > > > >> > >>> > > The only thing where the operator comes into play is > > > >> "teardown()", > > > >> > >>> which > > > >> > >>> > is > > > >> > >>> > > business logic (not Apex engine) issue. This could be db > > > >> connection > > > >> > >>> etc. > > > >> > >>> > > > > > >> > >>> > > Thks, > > > >> > >>> > > Amol > > > >> > >>> > > > > > >> > >>> > > On Thu, Sep 3, 2015 at 8:52 AM, Thomas Weise < > > > >> > [email protected] > > > >> > >>> > > > > >> > >>> > > wrote: > > > >> > >>> > > > > > >> > >>> > > > When the container gets killed, we should not assume > > > anything > > > >> > about > > > >> > >>> > > > cleanup. It can be a kill -9. Any related "cleanup" > falls > > > >> under > > > >> > >>> nice to > > > >> > >>> > > > have, no guarantees. > > > >> > >>> > > > > > > >> > >>> > > > On Thu, Sep 3, 2015 at 8:49 AM, Chandni Singh < > > > >> > >>> [email protected] > > > >> > >>> > > > > > >> > >>> > > > wrote: > > > >> > >>> > > > > > > >> > >>> > > > > I have a question regarding what Gaurav mentioned > > > >> > >>> > > > > ---- > > > >> > >>> > > > > When container runs in cluster, "." specifies the > > > containers > > > >> > >>> local > > > >> > >>> > path > > > >> > >>> > > > on > > > >> > >>> > > > > the node where container specific jars and other > > resources > > > >> > >>> resides. > > > >> > >>> > It > > > >> > >>> > > > > creates a folder under that which is live as long as > > > >> container > > > >> > >>> lives. > > > >> > >>> > > So > > > >> > >>> > > > > there are no vagrant folders anywhere > > > >> > >>> > > > > --- > > > >> > >>> > > > > > > > >> > >>> > > > > When the container gets killed, do we cleanup the > > folders > > > >> > >>> created by > > > >> > >>> > > > Async > > > >> > >>> > > > > under the containers working dir? > > > >> > >>> > > > > > > > >> > >>> > > > > On Thu, Sep 3, 2015 at 8:42 AM, Thomas Weise < > > > >> > >>> [email protected] > > > >> > >>> > > > > > >> > >>> > > > > wrote: > > > >> > >>> > > > > > > > >> > >>> > > > >> It makes sense to use the synchronous checkpointing > for > > > the > > > >> > >>> local > > > >> > >>> > > mode. > > > >> > >>> > > > >> LM is meant to simplify dependencies and setup. The > > > default > > > >> > for > > > >> > >>> > > > execution > > > >> > >>> > > > >> on YARN remains async. > > > >> > >>> > > > >> > > > >> > >>> > > > >> Thomas > > > >> > >>> > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > >> On Thu, Sep 3, 2015 at 8:34 AM, Chandni Singh < > > > >> > >>> > > [email protected]> > > > >> > >>> > > > >> wrote: > > > >> > >>> > > > >> > > > >> > >>> > > > >>> APPLICATION_PATH isn't related to local base dir of > > > Async > > > >> as > > > >> > >>> far > > > >> > >>> > as I > > > >> > >>> > > > >>> know. StramLocalCluster sets the APP_PATH to > > > "target/...". > > > >> > >>> > > > >>> StramLocalCluster should use FSStorageAgent. > > > >> > >>> > > > >>> > > > >> > >>> > > > >>> - Chandni > > > >> > >>> > > > >>> > > > >> > >>> > > > >>> On Thu, Sep 3, 2015 at 8:20 AM, Gaurav Gupta < > > > >> > >>> > [email protected] > > > >> > >>> > > > > > > >> > >>> > > > >>> wrote: > > > >> > >>> > > > >>> > > > >> > >>> > > > >>>> As Thomas mentioned as default remains to be async. > > You > > > >> can > > > >> > >>> either > > > >> > >>> > > > >>>> change the storage agent or set the > APPLICATION_PATH. > > > >> > >>> > > > >>>> > > > >> > >>> > > > >>>> When container runs in cluster, "." specifies the > > > >> containers > > > >> > >>> local > > > >> > >>> > > > path > > > >> > >>> > > > >>>> on the node where container specific jars and other > > > >> > resources > > > >> > >>> > > > resides. It > > > >> > >>> > > > >>>> creates a folder under that which is live as long > as > > > >> > container > > > >> > >>> > > lives. > > > >> > >>> > > > So > > > >> > >>> > > > >>>> there are no vagrant folders anywhere > > > >> > >>> > > > >>>> > > > >> > >>> > > > >>>> Thanks > > > >> > >>> > > > >>>> -Gaurav > > > >> > >>> > > > >>>> > > > >> > >>> > > > >>>> On Wed, Sep 2, 2015 at 11:33 PM, Chandni Singh < > > > >> > >>> > > > [email protected] > > > >> > >>> > > > >>>> > wrote: > > > >> > >>> > > > >>>> > > > >> > >>> > > > >>>>> I think there is a problem in the default Async as > > > >> well. It > > > >> > >>> also > > > >> > >>> > > uses > > > >> > >>> > > > >>>>> the working directory as its local base path. > > > >> > >>> > > > >>>>> > > > >> > >>> > > > >>>>> In the Async -> copyToHdfs() method, we delete > the > > > >> window > > > >> > >>> files > > > >> > >>> > > but > > > >> > >>> > > > >>>>> the folder with the operator name never gets > > deleted. > > > >> > >>> > > > >>>>> So on the cluster there will be such vagrant > > folders > > > in > > > >> > the > > > >> > >>> > > working > > > >> > >>> > > > >>>>> directory? > > > >> > >>> > > > >>>>> > > > >> > >>> > > > >>>>> On Wed, Sep 2, 2015 at 11:17 PM, Thomas Weise < > > > >> > >>> > > > [email protected]> > > > >> > >>> > > > >>>>> wrote: > > > >> > >>> > > > >>>>> > > > >> > >>> > > > >>>>>> Chandni, > > > >> > >>> > > > >>>>>> > > > >> > >>> > > > >>>>>> Agreed. See whether the tests work with the > > > synchronous > > > >> > >>> storage > > > >> > >>> > > > >>>>>> agent. If yes, change them. The default needs to > > > remain > > > >> > >>> async. > > > >> > >>> > > > >>>>>> > > > >> > >>> > > > >>>>>> Thomas > > > >> > >>> > > > >>>>>> > > > >> > >>> > > > >>>>>> > > > >> > >>> > > > >>>>>> On Wed, Sep 2, 2015 at 11:05 PM, Chandni Singh < > > > >> > >>> > > > >>>>>> [email protected]> wrote: > > > >> > >>> > > > >>>>>> > > > >> > >>> > > > >>>>>>> Hi, > > > >> > >>> > > > >>>>>>> > > > >> > >>> > > > >>>>>>> I would like to know what was the reason to use > > > >> > >>> > > AsyncFSStorageAgent > > > >> > >>> > > > >>>>>>> with StramLocalCluster? > > > >> > >>> > > > >>>>>>> StramLocalCluster is mainly for testing in a > > > >> > >>> non-distributed > > > >> > >>> > mode > > > >> > >>> > > > >>>>>>> and I am unclear how AsyncFSStorageAgent is > > helpful > > > in > > > >> > this > > > >> > >>> > mode. > > > >> > >>> > > > >>>>>>> > > > >> > >>> > > > >>>>>>> Thanks, > > > >> > >>> > > > >>>>>>> Chandni > > > >> > >>> > > > >>>>>>> > > > >> > >>> > > > >>>>>>> On Wed, Sep 2, 2015 at 10:45 PM, Chandni Singh < > > > >> > >>> > > > >>>>>>> [email protected]> wrote: > > > >> > >>> > > > >>>>>>> > > > >> > >>> > > > >>>>>>>> This is because of recent changes to > > > >> StramLocalCluster > > > >> > >>> where > > > >> > >>> > > > >>>>>>>> AsyncFSStorageAgent is used for checkpointing > > > >> > >>> > > > >>>>>>>> > > > >> > >>> > > > >>>>>>>> dag.setAttribute(OperatorContext.STORAGE_AGENT, > > new > > > >> > >>> > > > AsyncFSStorageAgent(new Path(pathUri, > > > >> > >>> > > > LogicalPlan.SUBDIR_CHECKPOINTS).toString(), null)); > > > >> > >>> > > > >>>>>>>> > > > >> > >>> > > > >>>>>>>> The AsyncFSStorageAgent(String path, > > Configuration > > > >> conf) > > > >> > >>> uses > > > >> > >>> > > "." > > > >> > >>> > > > as localBasePath and therefore creates sub-directories > per > > > >> > >>> operator in > > > >> > >>> > > the > > > >> > >>> > > > current working directory. > > > >> > >>> > > > >>>>>>>> > > > >> > >>> > > > >>>>>>>> I am going to create a ticket to address this > and > > > >> will > > > >> > >>> fix it. > > > >> > >>> > > > >>>>>>>> > > > >> > >>> > > > >>>>>>>> -Chandni > > > >> > >>> > > > >>>>>>>> > > > >> > >>> > > > >>>>>>>> > > > >> > >>> > > > >>>>>>>> On Wed, Sep 2, 2015 at 7:13 PM, Chandni Singh < > > > >> > >>> > > > >>>>>>>> [email protected]> wrote: > > > >> > >>> > > > >>>>>>>> > > > >> > >>> > > > >>>>>>>>> Hi, > > > >> > >>> > > > >>>>>>>>> > > > >> > >>> > > > >>>>>>>>> I can see empty folders getting created under > > > >> > Malhar/lib > > > >> > >>> > called > > > >> > >>> > > > >>>>>>>>> '1' and '2'. > > > >> > >>> > > > >>>>>>>>> I think this is because of using LocalMode to > > run > > > a > > > >> > test > > > >> > >>> > > > >>>>>>>>> application. > > > >> > >>> > > > >>>>>>>>> > > > >> > >>> > > > >>>>>>>>> > > > >> > >>> > > > >>>>>>>>> If anyone has checked in such cases please do > > > check > > > >> and > > > >> > >>> let > > > >> > >>> > us > > > >> > >>> > > > >>>>>>>>> know. > > > >> > >>> > > > >>>>>>>>> > > > >> > >>> > > > >>>>>>>>> Thanks, > > > >> > >>> > > > >>>>>>>>> Chandni > > > >> > >>> > > > >>>>>>>>> > > > >> > >>> > > > >>>>>>>> > > > >> > >>> > > > >>>>>>>> > > > >> > >>> > > > >>>>>>> > > > >> > >>> > > > >>>>>> > > > >> > >>> > > > >>>>> > > > >> > >>> > > > >>>> > > > >> > >>> > > > >>> > > > >> > >>> > > > >> > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > >> > >> > > > >> > >> > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > >
