CI failure seems to be worked on: https://twitter.com/codebetterCI/status/785854074713468932 (Thanks Wyatt for pointing that out)
I will look into the rest in a little while -- Itamar Syn-Hershko http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Lucene.NET committer and PMC member On Tue, Oct 11, 2016 at 10:10 PM, Shad Storhaug <[email protected]> wrote: > Update > ====== > > I have just pushed some commits that fix several bugs in the > Lucene.Net.Codecs project (all 452 tests pass most of the time, a few > random failures) and fix all but 4 of the failing tests in Lucene.Net.Core. > > > Fix for Test Context > ------------------------- > > For now, I have added method override stubs to each subclass in order to > add the [Test] attribute, so NUnit will run them in the correct context. I > did that on all of the superclass tests except for the ones in QueryParser > (since Itamar mentioned he would be working in that area). Itamar, you will > probably need to follow suit to get all of the QP tests to pass - namely > with the QueryParserTestBase and TestQueryParser classes. > > I have carefully put all of these changes into a single commit so it can > be reverted easily, if this solution doesn't happen to be compatible with > xUnit: https://github.com/apache/lucenenet/commit/ > 2a79edea6359e1ee1f83269cc7dc3ef2753ebf2c. Hopefully that makes life > easier for @conniey. > > @Itamar, let me know when this is completed on your end so I can do a > double revert and squash the test stubs from QueryParser into an > all-inclusive revert-able commit. > > We can now correctly see how many tests we have in the core. Currently > there are 2730 - it seems we are still missing 720 tests, assuming they all > were for something port-able. > > > Remaining Tests > --------------------- > > Next I plan to work on locating any tests that we have missed (starting in > the core). It seems these fall into several categories: > > 1. Tests that have not yet been ported. > 2. Tests that have been partially ported that have not been added to the > project. > 3. Tests that have been ported, but are missing the [Test] attribute. > 4. Tests in classes that have been ported that have been commented out > (presumably because at the time they were ported the dependencies did not > yet exist). > 5. Tests that have been Ignored in .NET that were not in Java. > 6. Tests that have NUnit Assume.That() logic that depends on some > non-existant JRE condition, so they are not running in .NET. > > I'll make a quick effort to get them to pass, but the main goal will be to > ensure they all can run and are included in the project. Just a heads up > that the number of test failures is likely to increase on this pass (but > the number of bugs will likely decrease). > > > Failing Core Tests > ----------------------- > > I have looked into the remaining tests somewhat. There are 2 issues that I > need some input on to solve. > > > TestRamUsageEstimator.TestSanity() > > Java Lucene uses a JRE-specific API to determine how much header size to > add on each field. This makes the estimates higher in Java. But more > importantly, this test is failing because the estimate for a real string > instance is coming back as the same size as its shallow size (16 bytes in > this case) - it needs to be at least 1 byte more than that for the test to > pass. In Java (at least in a 64 bit environment), there are an extra 4 > bytes being added for each field. > > Technically, there is a way to get these numbers from .NET, but it > involves calling undocumented APIs using pointers and will likely be > different from one .NET version to the next (a bad idea for a project that > needs to support multiple .NET versions). The only solution I can think of > is to hard code in an extra 4 bytes for 64 bit (and most likely 2 bytes for > 32 bit) in order to make the numbers for the instances larger than their > shallow size. I suppose the alternative would be to either comment out the > string test or change it to >= make it pass. Thoughts? Alternatives? > > > TestNumericDocValuesUpdates.TestUpdateOldSegments() > > I discovered what the issue is here (normally that is the hard part), but > it seems that the proper solution is going to be a major task. The > NamedSPILoader (backed by SPIClassIterator) in Java Lucene is used as a > service locator to load classes throughout the project. In the Codec > abstract class, it is used to load up the codec for the context it is used > in. However, our port of the NamedSPILoader simply loads all of the classes > from the current AppDomain without any way to order them or override them. > > The problem is that in Lucene, this was meant to be an extension point. > And this particular test (and probably many more of them) uses that > extension point to change the codec to a Mock from the test framework. This > line from TestRuleSetupAndRestoreClassEnv pretty much sums up what the > issue is: > > > Debug.Assert(Codec is Lucene42RWCodec, "fix your classpath to have > tests-framework.jar before lucene-core.jar"); > > Basically, it is using a configuration file to order the classes that are > loaded so the test mocks take priority over the built-in codecs. > > Just fixing the test could be done by making the static NamedSPILoader > variable in the Codec class internal and swapping in a test double. > However, that doesn't solve the bigger issue that Lucene.Net is missing its > extensibility for anyone who wants to write their own codec (or tap into > one of the other extensibility points). I guess the bigger question is how > important will it be for anyone to extend Lucene codecs or inject > dependencies into Analyzer factories? There doesn’t appear to be any more > extensibility than that in Lucene 4.8.0, but that could change in more > recent or future versions of Lucene. > > > CI Builds > ----------- > > Not working. Can someone look into that please? > > > Thanks, > Shad Storhaug (NightOwl888) > > > > -----Original Message----- > From: Shad Storhaug > Sent: Wednesday, October 5, 2016 8:23 PM > To: [email protected] > Cc: Connie Yau; '[email protected]' > Subject: RE: Remaining Work/Priorities > > > Analysis.ICU (Depends on ICU4j) hopefully we can remove the ICU DLLs > from the analysis.commons module? > > Just for clarification, these are two entirely different things in Java. > Analysis.Common (Analysis.Collator and Analysis.Th) depends on parts of > Java: > > import java.text.BreakIterator; > import java.text.Collator; > import java.text.ParseException; > import java.text.RuleBasedCollator; > > Highlighter.PostingsHighlighter and Highlighter.VectorHighlight also > depend on parts of Java: > > import java.text.BreakIterator; > import java.text.CharacterIterator; > > Analysis.ICU depends on a separate (icu4j) package: > > import com.ibm.icu.text.Normalizer; > import com.ibm.icu.text.Normalizer2; > import com.ibm.icu.text.Transliterator; > import com.ibm.icu.text.Replaceable; > import com.ibm.icu.text.Transliterator; > import com.ibm.icu.text.UTF16; > import com.ibm.icu.text.UnicodeSet; > import com.ibm.icu.text.FilteredNormalizer2; > import com.ibm.icu.text.Collator; > import com.ibm.icu.text.RuleBasedCollator; > import com.ibm.icu.util.ULocale; > import com.ibm.icu.text.RawCollationKey; > > That said, icu4j DOES have Collator and RuleBasedCollator classes, but it > DOES NOT have a BreakIterator or CharacterIterator class. It is unclear > whether the Collator from icu4j would work as a replacement for the one in > core Java. > > When I was digging through the JDK code, I noticed that BreakIterator and > RuleBasedCollator have a lot of common ICU dependencies there, so even if > the RuleBasedCollator from icu4j is compatible, it might make sense for us > to port the one from Java anyway so we are dealing with the same shared > dependencies in Analysis.Common. > > Once we port over the classes from the Java JDK, we will be able to > eliminate our current ICU4NET dependency (and the platform issues that come > with it). That said, porting over those pieces could take considerable > work. In the interim it might make sense to make separate projects/NuGet > packages to isolate the areas that depend on BreakIterator, > CharacterIterator, and RuleBasedCollator so the rest can be released for > wide/cross-platform use. Perhaps we can even make a basic (scaled down) > BreakIterator for Highlighter that breaks on spaces between words and > punctuation between sentences, which wouldn't work for Thai, but would work > for most other languages. > > Porting the (icu4j) package is another complete ball of yarn, we should > take a look at (https://github.com/sillsdev/icu-dotnet) to see if there > is enough overlap there to power Analysis.ICU (offhand it looks as though > some classes are missing, though). It is a wrapper around the C library - > it may be that we just need to port more of it to get all of the pieces we > need. > > Speaking of Collation, @ChristopherHaws have you made any more progress on > Analysis.Collation? Were you able to determine if icu-dotnet's collator > will make the tests pass? > > > I'm on it QueryParser.Flexible > > Great. The TimeZone probably just needs more research to work out how to > utilize (in order to implement the failing test). Also, FYI MSDN's > recommendation (https://msdn.microsoft.com/en-us/library/system.timezone( > v=vs.110).aspx) is to use TimeZoneInfo rather than TimeZone (I noticed > that several of the tests were recently modified to use TimeZone rather > than TimeZoneInfo). > > As for the culture, in .NET I am pretty sure that we need to pass it as a > parameter to another overload of `QueryParser.Parse` rather than making it > a property of QueryParser. But we can deal with that in one step after you > have finished porting. > > -- > > Shad Storhaug (NightOwl888) > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On > Behalf Of Itamar Syn-Hershko > Sent: Wednesday, October 5, 2016 5:28 AM > To: [email protected] > Cc: Connie Yau > Subject: Re: Remaining Work/Priorities > > Awesome, thanks for all the hard work Shad! > > Our first priority should be fixing all remaining tests - in particular > the one in Core. We should be ready to release and stamp our builds as 100% > stable. As you mentioned, this could be an infrastructure issue - hopefully > *Connie *can give a status update on her effort on the switch to xUnit? > > With regards to Modules, here's an updated breakdown based on your email + > forgotten pieces + my comments: > > *Ported:* > Lucene.Net (Core) - 15 failing / 1989 total Lucene.Net.Analysis.Common - 0 > failing / 1445 total Lucene.Net.Classification - 0 failing / 9 total > Lucene.Net.Expressions - 0 failing / 94 total Lucene.Net.Facet - (including > #188 will be) 0 failing / 152 total Lucene.Net.Join - 0 failing / 27 total > Lucene.Net.Memory - 0 failing / 10 total Lucene.Net.Misc - 2 failing / 42 > total Lucene.Net.Queries - 2 failing / 96 total Lucene.Net.QueryParser - 1 > failing / 203 total Lucene.Net.Suggest - 0 failing / 142 total > > We should do a second pass on the pieces we marked as ported, just to make > sure the port is full and we didn't leave anything behind :) > > *Need to be ported:* > Highlighter (Depends on Collator (which is still being ported) and > BreakIterator (which we don't have a solution that works in .NET core yet)) > Spatial (has 3rd party libraries that need to be updates) Spatial4n ( > https://github.com/synhershko/Spatial4N) needs to be brought up to speed > with spatial4j, dependencies of which may cause some issues.... > Codecs > Partially ported, mostly the tests weren't ported Grouping Not urgent, but > provides nice functionality that users will probably like > > The only part with dependencies seems to be the spatial module - I will > have a look there soon if you don't get to that before I do. > > *Can wait* - some modules are less frequently used, we should stabilize > and release first and then work on them based on demand Analysis.ICU > (Depends on ICU4j) hopefully we can remove the ICU DLLs from the > analysis.commons module? I keep getting reports on some issues they are > causing Analysis.Kuromoji Analysis.Morfologik (Depends on Morfologik) > Analysis.Phonetic (Depends on Apache Commons) Apache commons is mostly > helper libraries, so there's probably not real dependency just lots of > replacement Analysis.SmartCN Analysis.Stempel (currently in progress) > Analysis.UIMA (Depends on Tagger, uimaj-core, WhiteSpaceTokenizer) Demo > while important because can help newbies, we can do better by providing > docs and real world examples. I'm on it QueryParser.Flexible > > *No need to port* - neither are needed in our context Benchmark (many > dependencies) Replicator (many dependencies) Sandbox (Depends on Apache > Jakarta) > > Once all modules are ported and all tests are passing, I think we should > get two more items fixed before an official release: > > 1. .NET Core support - I'm not clear on the status of it at the moment. We > probably want to have it in for the release. > > 2. Public API Inconsistencies. We can discuss what should be done and what > not when we get to that stage. Some are an obvious "fixme" but some will > break code compatibility with Java I think we should avoid. > > One last note - *Wyatt*, do we know why there are no CI builds lately? > > -- > > Itamar Syn-Hershko > http://code972.com | @synhershko <https://twitter.com/synhershko> > Freelance Developer & Consultant Lucene.NET committer and PMC member > > On Sun, Oct 2, 2016 at 10:01 PM, Shad Storhaug <[email protected]> > wrote: > > > Hello, > > > > I just wanted to open this discussion to talk about the work remaining > > to be done on Lucene.Net version 4.8.0. We are nearly there, but that > > doesn't mean we don't still need help! > > > > > > FAILING TESTS > > ------------------- > > > > We now have over 5000 passing tests and as soon as pull request #188 ( > > https://github.com/apache/lucenenet/pull/188) is merged, by my count > > we have only 20 (actual) failing tests. Here is the breakdown by project: > > > > Lucene.Net (Core) - 15 failing / 1989 total Lucene.Net.Analysis.Common > > - 0 failing / 1445 total Lucene.Net.Classification - 0 failing / 9 > > total Lucene.Net.Expressions - 0 failing / 94 total Lucene.Net.Facet - > > (including #188 will be) 0 failing / 152 total Lucene.Net.Join - 0 > > failing / 27 total Lucene.Net.Memory - 0 failing / 10 total > > Lucene.Net.Misc - 2 failing / 42 total Lucene.Net.Queries - 2 failing > > / 96 total Lucene.Net.QueryParser - 1 failing / 203 total > > Lucene.Net.Suggest - 0 failing / 142 total > > > > The reason why I said ACTUAL tests above is because I recently > > discovered that many of the "failures" that are being reported are > > false negatives (in fact, the VS2015 NUnit test runner shows there are > > 135 failing tests total and 902 tests total that don't belong to any > > project). Most NUnit 2.6 test runners do not correctly run tests in > > shared abstract classes with the correct context (test setup) to make > > them pass. These out-of-context runs add several additional minutes to > the test run. > > > > As an experiment, I upgraded to NUnit 3.4.1 and it helped the > > situation somewhat - that is, it ran the tests in the correct context > > and I was able to determine that we have more tests than the numbers > > above and they are all succeeding. However, it also ran the tests in > > an invalid context (that is, the context of the abstract class without > > any setup) and some of them still showed as failures. > > > > I know @conniey is currently working on porting the tests over to xUnit. > > Hopefully, swapping test frameworks alone (or using some of the new > > fancy test attributes) is enough to fix this issue. If not, we need to > > find another solution - preferably one that can be applied to all of > > the tests in abstract classes without too much effort or changing them > > so they are too different from their Java counterpart. > > > > Remaining Pieces to Port > > --------------------------------- > > > > I took an inventory of the remaining pieces left to port a few days > > ago and here is what that looks like (alphabetical order): > > > > 1. Analysis.ICU (Depends on ICU4j) > > 2. Analysis.Kuromoji > > 3. Analysis.Morfologik (Depends on Morfologik) 4. Analysis.Phonetic > > (Depends on Apache Commons) 5. Analysis.SmartCN 6. Analysis.Stempel > > (currently in progress) 7. Analysis.UIMA (Depends on Tagger, > > uimaj-core, WhiteSpaceTokenizer) 8. Benchmark (many dependencies) 9. > > Demo 10. Highlighter (Depends on Collator (which is still being > > ported) and BreakIterator (which we don't have a solution that works > > in .NET core yet)) 11. Replicator (many dependencies) 12. Sandbox > > (Depends on Apache Jakarta) 13. Spatial (Already ported in #174 > > (https://github.com/apache/ lucenenet/pull/174), needs a recent > > version of spatial4n) 14. QueryParser.Flexible > > > > Itamar, it would be helpful if you would be so kind as to organize > > this list in terms of priority. It also couldn't hurt to update the > > contributing documents > > (https://github.com/apache/lucenenet/blob/master/CONTRIBUTING.md, > > and > > https://cwiki.apache.org/confluence/display/LUCENENET/Current+Status > > with the latest information so anyone who wants to help out knows the > > current status. > > > > Of course, it is the known status of dependencies that we need > > clarification on. Which of these dependencies is known to be ported? > > Which of them are ported but are not up to date? Which of them are > > known not to be ported, and which of them are unknown? > > > > > > Public API Inconsistencies > > --------------------------------- > > > > One thing that I have had my eye on for a while now is the > > .NETification/consistency of the core API (that is, in the Lucene.Net > > project). There are several issues that I would like to address > including: > > > > > > 1. Method names that are still camelCase > > > > 2. Properties that should be methods (because they do a lot of > > processing or because they are non-deterministic) > > > > 3. Methods that should be properties > > > > 4. .Size() vs .Size vs .Count - should generally all be .Count in > > .NET > > > > 5. Interfaces should begin with "I" > > > > 6. Classes should not begin with "I" followed by another capital > > letter (for some reason some of them were named that way) > > > > 7. .CharAt() should probably be this[] > > > > 8. Generic types nested within generic types (which cause Visual > > Studio to crash when Intellisense tries to read them) > > > > ... and so on. The only thing is these are all sweeping changes that > > will affect everyone helping out on Lucene.Net and anyone who is > > currently using the beta. So, I just wanted to gather some input on > > when the most appropriate time to begin working on these sweeping > changes would be? > > > > > > Thanks, > > Shad Storhaug (NightOwl888) > > > > > > > > > > > > > > >
