Hi Gagan, I did find the cause, but not a good solution. Relying on everyone to set their umask is going to be onerous. It would be great if you could provide a proper solution - the one you suggested sounds good.
Regards, Patrick On Tue, Oct 23, 2012 at 11:53 PM, Gagan Juneja <[email protected]> wrote: > Oops! I missed Patrick's last post. > > On Wed, Oct 24, 2012 at 12:07 PM, Gagan Juneja > <[email protected]>wrote: > >> I have simulated this issue on ubuntu box. I found that by default ubuntu >> creates directory with *775 *permissions. And there is one property in >> Hadoop Configuration named "dfs.datanode.data.dir.perm" and default value >> for this is *755*. Somewhere in code permissions for data directories are >> verified and it fails there and then. >> >> If we set this property in Configuration object with value *775,* all the >> test cases are passing and build is Successful. >> >> We can set this in *startDfs* method of *org.apache.blur.MiniCluster*class. >> Please verify this, if problem got resolved at your end then I can >> provide patch for this. >> >> Regards, >> Gagan >> >> >> >> On Wed, Oct 24, 2012 at 4:32 AM, Patrick Hunt <[email protected]> wrote: >> >>> Pushed a small cleanup to move all test file output into respective >>> target directories and use absolute paths for test file locations. >>> >>> I thought this might fix the BlurClusterTest however that's not the case: >>> >>> Starting DataNode 0 with dfs.data.dir: >>> >>> /home/phunt/dev/blur/src/blur-core/target/tmp/cluster/dfs/data/data1,/home/phunt/dev/blur/src/blur-core/target/tmp/cluster/dfs/data/data2 >>> ERROR 20121023_15:58:10:010_PDT [main] datanode.DataNode: All >>> directories in dfs.data.dir are invalid. >>> ERROR 20121023_15:58:10:010_PDT [main] datanode.DataNode: All >>> directories in dfs.data.dir are invalid. >>> ERROR 20121023_15:58:10:010_PDT [main] blur.MiniCluster: error opening >>> file system >>> java.lang.NullPointerException >>> at >>> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:422) >>> at >>> org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:280) >>> at >>> org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:124) >>> >>> Patrick >>> >>> On Tue, Oct 23, 2012 at 2:43 PM, Patrick Hunt <[email protected]> wrote: >>> > I pushed a small cleanup to versioning in the poms. >>> > >>> > Patrick >>> > >>> > On Tue, Oct 23, 2012 at 2:38 PM, Patrick Hunt <[email protected]> wrote: >>> >> I'll work on fixing the tmp issue, that's something I can handle. ;-) >>> >> Everything should be in target. >>> >> >>> >> Patrick >>> >> >>> >> On Tue, Oct 23, 2012 at 2:37 PM, Aaron McCurry <[email protected]> >>> wrote: >>> >>> Hmm, I will take a look at that one next. >>> >>> >>> >>> Aaron >>> >>> >>> >>> On Tue, Oct 23, 2012 at 5:20 PM, Patrick Hunt <[email protected]> >>> wrote: >>> >>>> Thanks Aaron. The other failing test "BlurClusterTest" is somehow due >>> >>>> to the directory used. "./tmp/cluster". If I change to >>> >>>> "file://tmp/cluster" the test passes. Any ideas? Seems somehow >>> related >>> >>>> to using relative paths? >>> >>>> >>> >>>> Patrick >>> >>>> >>> >>>> On Tue, Oct 23, 2012 at 2:13 PM, Aaron McCurry <[email protected]> >>> wrote: >>> >>>>> Found it, the test did not setup the indexing options correctly. I >>> >>>>> have committed a fix for the test. >>> >>>>> >>> >>>>> Aaron >>> >>>>> >>> >>>>> On Tue, Oct 23, 2012 at 5:08 PM, Aaron McCurry <[email protected]> >>> wrote: >>> >>>>>> After cleaning up the test, I have gotten the same NPE. Strange >>> >>>>>> behavior, still working on why. >>> >>>>>> >>> >>>>>> Aaron >>> >>>>>> >>> >>>>>> On Tue, Oct 23, 2012 at 3:06 PM, Patrick Hunt <[email protected]> >>> wrote: >>> >>>>>>> NP. here's the output. I'm on ubuntu 12.04. 1.6.0_26 >>> >>>>>>> >>> >>>>>>> "mvn clean test" results in: (I also removed the tmp directories >>> >>>>>>> manually, btw, we should move this to mvn target dir) >>> >>>>>>> >>> >>>>>>> >>> ------------------------------------------------------------------------------- >>> >>>>>>> Test set: org.apache.blur.utils.TermDocIterableTest >>> >>>>>>> >>> ------------------------------------------------------------------------------- >>> >>>>>>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: >>> 0.005 >>> >>>>>>> sec <<< FAILURE! >>> >>>>>>> testTermDocIterable(org.apache.blur.utils.TermDocIterableTest) >>> Time >>> >>>>>>> elapsed: 0.005 sec <<< ERROR! >>> >>>>>>> java.lang.NullPointerException >>> >>>>>>> at >>> org.apache.blur.utils.TermDocIterable.getNext(TermDocIterable.java:82) >>> >>>>>>> at >>> org.apache.blur.utils.TermDocIterable.access$000(TermDocIterable.java:29) >>> >>>>>>> at >>> org.apache.blur.utils.TermDocIterable$1.<init>(TermDocIterable.java:48) >>> >>>>>>> at >>> org.apache.blur.utils.TermDocIterable.iterator(TermDocIterable.java:47) >>> >>>>>>> at >>> org.apache.blur.utils.TermDocIterableTest.testTermDocIterable(TermDocIterableTest.java:65) >>> >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>> Method) >>> >>>>>>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> >>>>>>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> >>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>> >>>>>>> at >>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) >>> >>>>>>> at >>> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) >>> >>>>>>> at >>> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) >>> >>>>>>> at >>> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) >>> >>>>>>> at >>> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) >>> >>>>>>> at >>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) >>> >>>>>>> at >>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) >>> >>>>>>> at >>> org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) >>> >>>>>>> at >>> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) >>> >>>>>>> at >>> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) >>> >>>>>>> at >>> org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) >>> >>>>>>> at >>> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) >>> >>>>>>> at >>> org.junit.runners.ParentRunner.run(ParentRunner.java:236) >>> >>>>>>> at >>> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) >>> >>>>>>> at >>> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) >>> >>>>>>> at >>> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) >>> >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>> Method) >>> >>>>>>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> >>>>>>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> >>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>> >>>>>>> at >>> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) >>> >>>>>>> at >>> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) >>> >>>>>>> at >>> org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175) >>> >>>>>>> at >>> org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107) >>> >>>>>>> at >>> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> On Tue, Oct 23, 2012 at 12:02 PM, Aaron McCurry < >>> [email protected]> wrote: >>> >>>>>>>> Sorry, just missed that message. Hmm, I will look around and >>> try to >>> >>>>>>>> see if I can find something. Thanks. >>> >>>>>>>> >>> >>>>>>>> Aaron >>> >>>>>>>> >>> >>>>>>>> On Tue, Oct 23, 2012 at 2:59 PM, Patrick Hunt <[email protected]> >>> wrote: >>> >>>>>>>>> this is null in termdocsitertest >>> >>>>>>>>> >>> >>>>>>>>> DocsEnum termDocs = atomicReader.termDocsEnum(new >>> Term("id", >>> >>>>>>>>> Integer.toString(id))); >>> >>>>>>>>> >>> >>>>>>>>> due to fields() being null in termDocsEnum method >>> >>>>>>>>> >>> >>>>>>>>> I don't see why yet though. Given the segment file exists on the >>> >>>>>>>>> filesystem, etc... >>> >>>>>>>>> >>> >>>>>>>>> Patrick >>> >>>>>>>>> >>> >>>>>>>>> On Tue, Oct 23, 2012 at 11:50 AM, Aaron McCurry < >>> [email protected]> wrote: >>> >>>>>>>>>> Trying to reproduce on Ubuntu. >>> >>>>>>>>>> >>> >>>>>>>>>> On Tue, Oct 23, 2012 at 1:58 PM, Patrick Hunt < >>> [email protected]> wrote: >>> >>>>>>>>>>> Hm, I just updated and I'm seeing two errors (which is 1 less >>> issue >>> >>>>>>>>>>> than before): >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> testTermDocIterable(org.apache.blur.utils.TermDocIterableTest) >>> >>>>>>>>>>> org.apache.blur.thrift.BlurClusterTest: >>> java.lang.NullPointerException >>> >>>>>>>>>>> >>> >>>>>>>>>>> Let me look and see if I can at least determine what the >>> underlying >>> >>>>>>>>>>> problems are. >>> >>>>>>>>>>> >>> >>>>>>>>>>> Patrick >>> >>>>>>>>>>> >>> >>>>>>>>>>> On Tue, Oct 23, 2012 at 10:12 AM, Aaron McCurry < >>> [email protected]> wrote: >>> >>>>>>>>>>>> I ran into some errors with ZookeeperClusterStatusTest tests >>> and have >>> >>>>>>>>>>>> resolved the issues I found. All units tests pass on OSX, I >>> have not >>> >>>>>>>>>>>> had a chance to run them on Linux yet. I also fixed the >>> nasty NPE >>> >>>>>>>>>>>> exception on the BlurClusterTest (it was affecting the >>> functional >>> >>>>>>>>>>>> tests as well). I ran a few burn-in tests on a VM running a >>> 2 >>> >>>>>>>>>>>> controller + 3 shard server Blur cluster. The tests >>> included loaded >>> >>>>>>>>>>>> data as fast as possibly while running searches against that >>> data as >>> >>>>>>>>>>>> fast as possible. The tests ran without issue (basically >>> like they >>> >>>>>>>>>>>> did before the upgrade to Lucene 4). I feel like the code >>> is in a >>> >>>>>>>>>>>> good state at this point. I'm going to merge this code to >>> master and >>> >>>>>>>>>>>> create another branch to begin modifying the RPC API. >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> Anyone have any objections? >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> Aaron >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> On Mon, Oct 22, 2012 at 8:29 PM, Patrick Hunt < >>> [email protected]> wrote: >>> >>>>>>>>>>>>> On Mon, Oct 22, 2012 at 5:23 PM, Aaron McCurry < >>> [email protected]> wrote: >>> >>>>>>>>>>>>>> Hmm. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> On Mon, Oct 22, 2012 at 8:17 PM, Patrick Hunt < >>> [email protected]> wrote: >>> >>>>>>>>>>>>>>> Sounds good to me. >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Not sure if anyone else is seeing this but the unit tests >>> are not >>> >>>>>>>>>>>>>>> passing for me on ubuntu. I see one failure and two >>> errors. >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Failed tests: >>> >>>>>>>>>>>>>>> >>> >>> testSafeModeSetInFuture(org.apache.blur.manager.clusterstatus.ZookeeperClusterStatusTest) >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> Haven't seen this. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Tests in error: >>> >>>>>>>>>>>>>>> >>> testTermDocIterable(org.apache.blur.utils.TermDocIterableTest) >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> This either. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> org.apache.blur.thrift.BlurClusterTest: >>> java.lang.NullPointerException >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> I think I have been seeing this one during some functional >>> tests. >>> >>>>>>>>>>>>>> Haven't figured out the cause yet, but it seems like it's >>> a nasty >>> >>>>>>>>>>>>>> threading problem. Because when I drop the mutate threads >>> back 1 >>> >>>>>>>>>>>>>> everything works fine. However the test was passing on >>> OSX. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Just me or is this expected? >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> Not expected. I'm going to setup a VM on computer to run >>> tests in >>> >>>>>>>>>>>>>> Linux as well. >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> Ok. Let me know how it goes and I can try and debug it a >>> bit, although >>> >>>>>>>>>>>>> you're running much faster than I can at this point. ;-) >>> Definitely >>> >>>>>>>>>>>>> let me know if you can't reproduce it and I'll dig into it >>> for sure. >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> Patrick >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Patrick >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> On Sun, Oct 21, 2012 at 10:38 AM, Aaron McCurry < >>> [email protected]> wrote: >>> >>>>>>>>>>>>>>>> We can fix the jira issues. >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> On Sun, Oct 21, 2012 at 1:36 PM, Garrett Barton >>> >>>>>>>>>>>>>>>> <[email protected]> wrote: >>> >>>>>>>>>>>>>>>>> Sounds good to me Aaron, call it 0.2. Does that mess up >>> Jira if you have >>> >>>>>>>>>>>>>>>>> things scheduled against releases? >>> >>>>>>>>>>>>>>>>> On Oct 21, 2012 9:44 AM, "Aaron McCurry" < >>> [email protected]> wrote: >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> Ok, I think it will be some time before all the >>> changes for the new >>> >>>>>>>>>>>>>>>>>> api are in place and fully functional. So perhaps we >>> should merge the >>> >>>>>>>>>>>>>>>>>> lucene-4.0.0 branch into master and fix whatever bugs >>> are found. I >>> >>>>>>>>>>>>>>>>>> did some system testing yesterday and only found one >>> big issue. There >>> >>>>>>>>>>>>>>>>>> seems to be a threading problem with the BlurAnalyzer. >>> If a single >>> >>>>>>>>>>>>>>>>>> instance is in use across multiple threads some weird >>> behaviors >>> >>>>>>>>>>>>>>>>>> happen. Otherwise everything else seems to work, >>> normally (I will >>> >>>>>>>>>>>>>>>>>> create a jira issue). >>> >>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> If we do merge the lucene-4.0.0 branch, I feel like we >>> should change >>> >>>>>>>>>>>>>>>>>> the version to 0.2. The reason is, the indexes in >>> 0.1.x are not going >>> >>>>>>>>>>>>>>>>>> to be backwards compatible (at least not with out some >>> work). Does >>> >>>>>>>>>>>>>>>>>> anyone have any strong feelings on this? >>> >>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> Aaron >>> >>>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>> On Sat, Oct 20, 2012 at 10:10 PM, Gagan Juneja >>> >>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>> >>>>>>>>>>>>>>>>>> > I agree with Garrett. We can merge this branch to >>> the place from where we >>> >>>>>>>>>>>>>>>>>> > cut it. Again as Garrett said If we want to keep >>> only new api thing then >>> >>>>>>>>>>>>>>>>>> we >>> >>>>>>>>>>>>>>>>>> > can merge it to master as well. >>> >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > Regards, >>> >>>>>>>>>>>>>>>>>> > Gagan >>> >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > On Sat, Oct 20, 2012 at 9:50 PM, Garrett Barton < >>> >>>>>>>>>>>>>>>>>> [email protected]>wrote: >>> >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> >> I guess it depends on if your planning a 1.4 >>> release with lucene 4. If >>> >>>>>>>>>>>>>>>>>> yes >>> >>>>>>>>>>>>>>>>>> >> then merge and work towards making everything >>> functional. If not then >>> >>>>>>>>>>>>>>>>>> leave >>> >>>>>>>>>>>>>>>>>> >> the 1.3.x in master for bug fixing or whatnot and >>> merge this branch into >>> >>>>>>>>>>>>>>>>>> >> the new api one. >>> >>>>>>>>>>>>>>>>>> >> On Oct 20, 2012 11:03 AM, "Aaron McCurry" < >>> [email protected]> wrote: >>> >>>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>>> >> > I think that we can merge the lucene-4.0.0 branch >>> back into the >>> >>>>>>>>>>>>>>>>>> >> > master, since tests and code are compiling. I >>> haven't done any >>> >>>>>>>>>>>>>>>>>> >> > functional testing yet, but if much of the RPC >>> and internals are going >>> >>>>>>>>>>>>>>>>>> >> > to change I think that it may be a waste of time >>> to test and fix >>> >>>>>>>>>>>>>>>>>> >> > everything that we are about to change. What do >>> others think? >>> >>>>>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>>>>> >> > Aaron >>> >>>>>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>>> >>> >> >>
