Patrick, I believe the issues with the ZookeeperClusterStatusTest test (possibly others as well), are in the setup of the ZK server process inside the test. Is there code in ZK that would work better than the code in the Blue mini cluster code? It feels a bit like a hack the way it's currently implemented.
Aaron On Wed, Oct 24, 2012 at 2:27 PM, Aaron McCurry <[email protected]> wrote: > +1 on this solution. > > On Wed, Oct 24, 2012 at 2:24 PM, Patrick Hunt <[email protected]> wrote: >> Hi Gagan, I did find the cause, but not a good solution. Relying on >> everyone to set their umask is going to be onerous. It would be great >> if you could provide a proper solution - the one you suggested sounds >> good. >> >> Regards, >> >> Patrick >> >> On Tue, Oct 23, 2012 at 11:53 PM, Gagan Juneja >> <[email protected]> wrote: >>> Oops! I missed Patrick's last post. >>> >>> On Wed, Oct 24, 2012 at 12:07 PM, Gagan Juneja >>> <[email protected]>wrote: >>> >>>> I have simulated this issue on ubuntu box. I found that by default ubuntu >>>> creates directory with *775 *permissions. And there is one property in >>>> Hadoop Configuration named "dfs.datanode.data.dir.perm" and default value >>>> for this is *755*. Somewhere in code permissions for data directories are >>>> verified and it fails there and then. >>>> >>>> If we set this property in Configuration object with value *775,* all the >>>> test cases are passing and build is Successful. >>>> >>>> We can set this in *startDfs* method of >>>> *org.apache.blur.MiniCluster*class. Please verify this, if problem got >>>> resolved at your end then I can >>>> provide patch for this. >>>> >>>> Regards, >>>> Gagan >>>> >>>> >>>> >>>> On Wed, Oct 24, 2012 at 4:32 AM, Patrick Hunt <[email protected]> wrote: >>>> >>>>> Pushed a small cleanup to move all test file output into respective >>>>> target directories and use absolute paths for test file locations. >>>>> >>>>> I thought this might fix the BlurClusterTest however that's not the case: >>>>> >>>>> Starting DataNode 0 with dfs.data.dir: >>>>> >>>>> /home/phunt/dev/blur/src/blur-core/target/tmp/cluster/dfs/data/data1,/home/phunt/dev/blur/src/blur-core/target/tmp/cluster/dfs/data/data2 >>>>> ERROR 20121023_15:58:10:010_PDT [main] datanode.DataNode: All >>>>> directories in dfs.data.dir are invalid. >>>>> ERROR 20121023_15:58:10:010_PDT [main] datanode.DataNode: All >>>>> directories in dfs.data.dir are invalid. >>>>> ERROR 20121023_15:58:10:010_PDT [main] blur.MiniCluster: error opening >>>>> file system >>>>> java.lang.NullPointerException >>>>> at >>>>> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:422) >>>>> at >>>>> org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:280) >>>>> at >>>>> org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:124) >>>>> >>>>> Patrick >>>>> >>>>> On Tue, Oct 23, 2012 at 2:43 PM, Patrick Hunt <[email protected]> wrote: >>>>> > I pushed a small cleanup to versioning in the poms. >>>>> > >>>>> > Patrick >>>>> > >>>>> > On Tue, Oct 23, 2012 at 2:38 PM, Patrick Hunt <[email protected]> wrote: >>>>> >> I'll work on fixing the tmp issue, that's something I can handle. ;-) >>>>> >> Everything should be in target. >>>>> >> >>>>> >> Patrick >>>>> >> >>>>> >> On Tue, Oct 23, 2012 at 2:37 PM, Aaron McCurry <[email protected]> >>>>> wrote: >>>>> >>> Hmm, I will take a look at that one next. >>>>> >>> >>>>> >>> Aaron >>>>> >>> >>>>> >>> On Tue, Oct 23, 2012 at 5:20 PM, Patrick Hunt <[email protected]> >>>>> wrote: >>>>> >>>> Thanks Aaron. The other failing test "BlurClusterTest" is somehow due >>>>> >>>> to the directory used. "./tmp/cluster". If I change to >>>>> >>>> "file://tmp/cluster" the test passes. Any ideas? Seems somehow >>>>> related >>>>> >>>> to using relative paths? >>>>> >>>> >>>>> >>>> Patrick >>>>> >>>> >>>>> >>>> On Tue, Oct 23, 2012 at 2:13 PM, Aaron McCurry <[email protected]> >>>>> wrote: >>>>> >>>>> Found it, the test did not setup the indexing options correctly. I >>>>> >>>>> have committed a fix for the test. >>>>> >>>>> >>>>> >>>>> Aaron >>>>> >>>>> >>>>> >>>>> On Tue, Oct 23, 2012 at 5:08 PM, Aaron McCurry <[email protected]> >>>>> wrote: >>>>> >>>>>> After cleaning up the test, I have gotten the same NPE. Strange >>>>> >>>>>> behavior, still working on why. >>>>> >>>>>> >>>>> >>>>>> Aaron >>>>> >>>>>> >>>>> >>>>>> On Tue, Oct 23, 2012 at 3:06 PM, Patrick Hunt <[email protected]> >>>>> wrote: >>>>> >>>>>>> NP. here's the output. I'm on ubuntu 12.04. 1.6.0_26 >>>>> >>>>>>> >>>>> >>>>>>> "mvn clean test" results in: (I also removed the tmp directories >>>>> >>>>>>> manually, btw, we should move this to mvn target dir) >>>>> >>>>>>> >>>>> >>>>>>> >>>>> ------------------------------------------------------------------------------- >>>>> >>>>>>> Test set: org.apache.blur.utils.TermDocIterableTest >>>>> >>>>>>> >>>>> ------------------------------------------------------------------------------- >>>>> >>>>>>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: >>>>> 0.005 >>>>> >>>>>>> sec <<< FAILURE! >>>>> >>>>>>> testTermDocIterable(org.apache.blur.utils.TermDocIterableTest) >>>>> Time >>>>> >>>>>>> elapsed: 0.005 sec <<< ERROR! >>>>> >>>>>>> java.lang.NullPointerException >>>>> >>>>>>> at >>>>> org.apache.blur.utils.TermDocIterable.getNext(TermDocIterable.java:82) >>>>> >>>>>>> at >>>>> org.apache.blur.utils.TermDocIterable.access$000(TermDocIterable.java:29) >>>>> >>>>>>> at >>>>> org.apache.blur.utils.TermDocIterable$1.<init>(TermDocIterable.java:48) >>>>> >>>>>>> at >>>>> org.apache.blur.utils.TermDocIterable.iterator(TermDocIterable.java:47) >>>>> >>>>>>> at >>>>> org.apache.blur.utils.TermDocIterableTest.testTermDocIterable(TermDocIterableTest.java:65) >>>>> >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>> Method) >>>>> >>>>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>> >>>>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> >>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> >>>>>>> at >>>>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) >>>>> >>>>>>> at >>>>> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) >>>>> >>>>>>> at >>>>> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) >>>>> >>>>>>> at >>>>> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) >>>>> >>>>>>> at >>>>> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) >>>>> >>>>>>> at >>>>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) >>>>> >>>>>>> at >>>>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) >>>>> >>>>>>> at >>>>> org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) >>>>> >>>>>>> at >>>>> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) >>>>> >>>>>>> at >>>>> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) >>>>> >>>>>>> at >>>>> org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) >>>>> >>>>>>> at >>>>> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) >>>>> >>>>>>> at >>>>> org.junit.runners.ParentRunner.run(ParentRunner.java:236) >>>>> >>>>>>> at >>>>> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) >>>>> >>>>>>> at >>>>> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) >>>>> >>>>>>> at >>>>> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) >>>>> >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>> Method) >>>>> >>>>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>> >>>>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> >>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> >>>>>>> at >>>>> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) >>>>> >>>>>>> at >>>>> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) >>>>> >>>>>>> at >>>>> org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175) >>>>> >>>>>>> at >>>>> org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107) >>>>> >>>>>>> at >>>>> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) >>>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> On Tue, Oct 23, 2012 at 12:02 PM, Aaron McCurry < >>>>> [email protected]> wrote: >>>>> >>>>>>>> Sorry, just missed that message. Hmm, I will look around and >>>>> try to >>>>> >>>>>>>> see if I can find something. Thanks. >>>>> >>>>>>>> >>>>> >>>>>>>> Aaron >>>>> >>>>>>>> >>>>> >>>>>>>> On Tue, Oct 23, 2012 at 2:59 PM, Patrick Hunt <[email protected]> >>>>> wrote: >>>>> >>>>>>>>> this is null in termdocsitertest >>>>> >>>>>>>>> >>>>> >>>>>>>>> DocsEnum termDocs = atomicReader.termDocsEnum(new >>>>> Term("id", >>>>> >>>>>>>>> Integer.toString(id))); >>>>> >>>>>>>>> >>>>> >>>>>>>>> due to fields() being null in termDocsEnum method >>>>> >>>>>>>>> >>>>> >>>>>>>>> I don't see why yet though. Given the segment file exists on the >>>>> >>>>>>>>> filesystem, etc... >>>>> >>>>>>>>> >>>>> >>>>>>>>> Patrick >>>>> >>>>>>>>> >>>>> >>>>>>>>> On Tue, Oct 23, 2012 at 11:50 AM, Aaron McCurry < >>>>> [email protected]> wrote: >>>>> >>>>>>>>>> Trying to reproduce on Ubuntu. >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> On Tue, Oct 23, 2012 at 1:58 PM, Patrick Hunt < >>>>> [email protected]> wrote: >>>>> >>>>>>>>>>> Hm, I just updated and I'm seeing two errors (which is 1 less >>>>> issue >>>>> >>>>>>>>>>> than before): >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> testTermDocIterable(org.apache.blur.utils.TermDocIterableTest) >>>>> >>>>>>>>>>> org.apache.blur.thrift.BlurClusterTest: >>>>> java.lang.NullPointerException >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Let me look and see if I can at least determine what the >>>>> underlying >>>>> >>>>>>>>>>> problems are. >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Patrick >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> On Tue, Oct 23, 2012 at 10:12 AM, Aaron McCurry < >>>>> [email protected]> wrote: >>>>> >>>>>>>>>>>> I ran into some errors with ZookeeperClusterStatusTest tests >>>>> and have >>>>> >>>>>>>>>>>> resolved the issues I found. All units tests pass on OSX, I >>>>> have not >>>>> >>>>>>>>>>>> had a chance to run them on Linux yet. I also fixed the >>>>> nasty NPE >>>>> >>>>>>>>>>>> exception on the BlurClusterTest (it was affecting the >>>>> functional >>>>> >>>>>>>>>>>> tests as well). I ran a few burn-in tests on a VM running a >>>>> 2 >>>>> >>>>>>>>>>>> controller + 3 shard server Blur cluster. The tests >>>>> included loaded >>>>> >>>>>>>>>>>> data as fast as possibly while running searches against that >>>>> data as >>>>> >>>>>>>>>>>> fast as possible. The tests ran without issue (basically >>>>> like they >>>>> >>>>>>>>>>>> did before the upgrade to Lucene 4). I feel like the code >>>>> is in a >>>>> >>>>>>>>>>>> good state at this point. I'm going to merge this code to >>>>> master and >>>>> >>>>>>>>>>>> create another branch to begin modifying the RPC API. >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> Anyone have any objections? >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> Aaron >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> On Mon, Oct 22, 2012 at 8:29 PM, Patrick Hunt < >>>>> [email protected]> wrote: >>>>> >>>>>>>>>>>>> On Mon, Oct 22, 2012 at 5:23 PM, Aaron McCurry < >>>>> [email protected]> wrote: >>>>> >>>>>>>>>>>>>> Hmm. >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> On Mon, Oct 22, 2012 at 8:17 PM, Patrick Hunt < >>>>> [email protected]> wrote: >>>>> >>>>>>>>>>>>>>> Sounds good to me. >>>>> >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> Not sure if anyone else is seeing this but the unit tests >>>>> are not >>>>> >>>>>>>>>>>>>>> passing for me on ubuntu. I see one failure and two >>>>> errors. >>>>> >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> Failed tests: >>>>> >>>>>>>>>>>>>>> >>>>> >>>>> testSafeModeSetInFuture(org.apache.blur.manager.clusterstatus.ZookeeperClusterStatusTest) >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> Haven't seen this. >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> Tests in error: >>>>> >>>>>>>>>>>>>>> >>>>> testTermDocIterable(org.apache.blur.utils.TermDocIterableTest) >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> This either. >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> org.apache.blur.thrift.BlurClusterTest: >>>>> java.lang.NullPointerException >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> I think I have been seeing this one during some functional >>>>> tests. >>>>> >>>>>>>>>>>>>> Haven't figured out the cause yet, but it seems like it's >>>>> a nasty >>>>> >>>>>>>>>>>>>> threading problem. Because when I drop the mutate threads >>>>> back 1 >>>>> >>>>>>>>>>>>>> everything works fine. However the test was passing on >>>>> OSX. >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> Just me or is this expected? >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> Not expected. I'm going to setup a VM on computer to run >>>>> tests in >>>>> >>>>>>>>>>>>>> Linux as well. >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> Ok. Let me know how it goes and I can try and debug it a >>>>> bit, although >>>>> >>>>>>>>>>>>> you're running much faster than I can at this point. ;-) >>>>> Definitely >>>>> >>>>>>>>>>>>> let me know if you can't reproduce it and I'll dig into it >>>>> for sure. >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> Patrick >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> Patrick >>>>> >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> On Sun, Oct 21, 2012 at 10:38 AM, Aaron McCurry < >>>>> [email protected]> wrote: >>>>> >>>>>>>>>>>>>>>> We can fix the jira issues. >>>>> >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> On Sun, Oct 21, 2012 at 1:36 PM, Garrett Barton >>>>> >>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>> >>>>>>>>>>>>>>>>> Sounds good to me Aaron, call it 0.2. Does that mess up >>>>> Jira if you have >>>>> >>>>>>>>>>>>>>>>> things scheduled against releases? >>>>> >>>>>>>>>>>>>>>>> On Oct 21, 2012 9:44 AM, "Aaron McCurry" < >>>>> [email protected]> wrote: >>>>> >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>> Ok, I think it will be some time before all the >>>>> changes for the new >>>>> >>>>>>>>>>>>>>>>>> api are in place and fully functional. So perhaps we >>>>> should merge the >>>>> >>>>>>>>>>>>>>>>>> lucene-4.0.0 branch into master and fix whatever bugs >>>>> are found. I >>>>> >>>>>>>>>>>>>>>>>> did some system testing yesterday and only found one >>>>> big issue. There >>>>> >>>>>>>>>>>>>>>>>> seems to be a threading problem with the BlurAnalyzer. >>>>> If a single >>>>> >>>>>>>>>>>>>>>>>> instance is in use across multiple threads some weird >>>>> behaviors >>>>> >>>>>>>>>>>>>>>>>> happen. Otherwise everything else seems to work, >>>>> normally (I will >>>>> >>>>>>>>>>>>>>>>>> create a jira issue). >>>>> >>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>> If we do merge the lucene-4.0.0 branch, I feel like we >>>>> should change >>>>> >>>>>>>>>>>>>>>>>> the version to 0.2. The reason is, the indexes in >>>>> 0.1.x are not going >>>>> >>>>>>>>>>>>>>>>>> to be backwards compatible (at least not with out some >>>>> work). Does >>>>> >>>>>>>>>>>>>>>>>> anyone have any strong feelings on this? >>>>> >>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>> Aaron >>>>> >>>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>>> On Sat, Oct 20, 2012 at 10:10 PM, Gagan Juneja >>>>> >>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>> >>>>>>>>>>>>>>>>>> > I agree with Garrett. We can merge this branch to >>>>> the place from where we >>>>> >>>>>>>>>>>>>>>>>> > cut it. Again as Garrett said If we want to keep >>>>> only new api thing then >>>>> >>>>>>>>>>>>>>>>>> we >>>>> >>>>>>>>>>>>>>>>>> > can merge it to master as well. >>>>> >>>>>>>>>>>>>>>>>> > >>>>> >>>>>>>>>>>>>>>>>> > Regards, >>>>> >>>>>>>>>>>>>>>>>> > Gagan >>>>> >>>>>>>>>>>>>>>>>> > >>>>> >>>>>>>>>>>>>>>>>> > On Sat, Oct 20, 2012 at 9:50 PM, Garrett Barton < >>>>> >>>>>>>>>>>>>>>>>> [email protected]>wrote: >>>>> >>>>>>>>>>>>>>>>>> > >>>>> >>>>>>>>>>>>>>>>>> >> I guess it depends on if your planning a 1.4 >>>>> release with lucene 4. If >>>>> >>>>>>>>>>>>>>>>>> yes >>>>> >>>>>>>>>>>>>>>>>> >> then merge and work towards making everything >>>>> functional. If not then >>>>> >>>>>>>>>>>>>>>>>> leave >>>>> >>>>>>>>>>>>>>>>>> >> the 1.3.x in master for bug fixing or whatnot and >>>>> merge this branch into >>>>> >>>>>>>>>>>>>>>>>> >> the new api one. >>>>> >>>>>>>>>>>>>>>>>> >> On Oct 20, 2012 11:03 AM, "Aaron McCurry" < >>>>> [email protected]> wrote: >>>>> >>>>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>>>> >> > I think that we can merge the lucene-4.0.0 branch >>>>> back into the >>>>> >>>>>>>>>>>>>>>>>> >> > master, since tests and code are compiling. I >>>>> haven't done any >>>>> >>>>>>>>>>>>>>>>>> >> > functional testing yet, but if much of the RPC >>>>> and internals are going >>>>> >>>>>>>>>>>>>>>>>> >> > to change I think that it may be a waste of time >>>>> to test and fix >>>>> >>>>>>>>>>>>>>>>>> >> > everything that we are about to change. What do >>>>> others think? >>>>> >>>>>>>>>>>>>>>>>> >> > >>>>> >>>>>>>>>>>>>>>>>> >> > Aaron >>>>> >>>>>>>>>>>>>>>>>> >> > >>>>> >>>>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>>>> >>>>> >>>> >>>>
