While on that note, Shad - emails to you bounce with the following error (still):
Delivery to the following recipient failed permanently: s...@shadstorhaug.com Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the server for the recipient domain shadstorhaug.com by mx1.hostmailserver.com. [69.160.246.214]. The error that the other server returned was: 554 5.7.1 gmail.com is blacklisted. -- Itamar Syn-Hershko http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Lucene.NET committer and PMC member On Wed, Oct 12, 2016 at 8:06 PM, Itamar Syn-Hershko <ita...@code972.com> wrote: > CI failure seems to be worked on: https://twitter.com/codebetterCI/status/ > 785854074713468932 (Thanks Wyatt for pointing that out) > > I will look into the rest in a little while > > -- > > Itamar Syn-Hershko > http://code972.com | @synhershko <https://twitter.com/synhershko> > Freelance Developer & Consultant > Lucene.NET committer and PMC member > > On Tue, Oct 11, 2016 at 10:10 PM, Shad Storhaug <s...@shadstorhaug.com> > wrote: > >> Update >> ====== >> >> I have just pushed some commits that fix several bugs in the >> Lucene.Net.Codecs project (all 452 tests pass most of the time, a few >> random failures) and fix all but 4 of the failing tests in Lucene.Net.Core. >> >> >> Fix for Test Context >> ------------------------- >> >> For now, I have added method override stubs to each subclass in order to >> add the [Test] attribute, so NUnit will run them in the correct context. I >> did that on all of the superclass tests except for the ones in QueryParser >> (since Itamar mentioned he would be working in that area). Itamar, you will >> probably need to follow suit to get all of the QP tests to pass - namely >> with the QueryParserTestBase and TestQueryParser classes. >> >> I have carefully put all of these changes into a single commit so it can >> be reverted easily, if this solution doesn't happen to be compatible with >> xUnit: https://github.com/apache/lucenenet/commit/2a79edea6359e1ee1 >> f83269cc7dc3ef2753ebf2c. Hopefully that makes life easier for @conniey. >> >> @Itamar, let me know when this is completed on your end so I can do a >> double revert and squash the test stubs from QueryParser into an >> all-inclusive revert-able commit. >> >> We can now correctly see how many tests we have in the core. Currently >> there are 2730 - it seems we are still missing 720 tests, assuming they all >> were for something port-able. >> >> >> Remaining Tests >> --------------------- >> >> Next I plan to work on locating any tests that we have missed (starting >> in the core). It seems these fall into several categories: >> >> 1. Tests that have not yet been ported. >> 2. Tests that have been partially ported that have not been added to the >> project. >> 3. Tests that have been ported, but are missing the [Test] attribute. >> 4. Tests in classes that have been ported that have been commented out >> (presumably because at the time they were ported the dependencies did not >> yet exist). >> 5. Tests that have been Ignored in .NET that were not in Java. >> 6. Tests that have NUnit Assume.That() logic that depends on some >> non-existant JRE condition, so they are not running in .NET. >> >> I'll make a quick effort to get them to pass, but the main goal will be >> to ensure they all can run and are included in the project. Just a heads up >> that the number of test failures is likely to increase on this pass (but >> the number of bugs will likely decrease). >> >> >> Failing Core Tests >> ----------------------- >> >> I have looked into the remaining tests somewhat. There are 2 issues that >> I need some input on to solve. >> >> >> TestRamUsageEstimator.TestSanity() >> >> Java Lucene uses a JRE-specific API to determine how much header size to >> add on each field. This makes the estimates higher in Java. But more >> importantly, this test is failing because the estimate for a real string >> instance is coming back as the same size as its shallow size (16 bytes in >> this case) - it needs to be at least 1 byte more than that for the test to >> pass. In Java (at least in a 64 bit environment), there are an extra 4 >> bytes being added for each field. >> >> Technically, there is a way to get these numbers from .NET, but it >> involves calling undocumented APIs using pointers and will likely be >> different from one .NET version to the next (a bad idea for a project that >> needs to support multiple .NET versions). The only solution I can think of >> is to hard code in an extra 4 bytes for 64 bit (and most likely 2 bytes for >> 32 bit) in order to make the numbers for the instances larger than their >> shallow size. I suppose the alternative would be to either comment out the >> string test or change it to >= make it pass. Thoughts? Alternatives? >> >> >> TestNumericDocValuesUpdates.TestUpdateOldSegments() >> >> I discovered what the issue is here (normally that is the hard part), but >> it seems that the proper solution is going to be a major task. The >> NamedSPILoader (backed by SPIClassIterator) in Java Lucene is used as a >> service locator to load classes throughout the project. In the Codec >> abstract class, it is used to load up the codec for the context it is used >> in. However, our port of the NamedSPILoader simply loads all of the classes >> from the current AppDomain without any way to order them or override them. >> >> The problem is that in Lucene, this was meant to be an extension point. >> And this particular test (and probably many more of them) uses that >> extension point to change the codec to a Mock from the test framework. This >> line from TestRuleSetupAndRestoreClassEnv pretty much sums up what the >> issue is: >> >> > Debug.Assert(Codec is Lucene42RWCodec, "fix your classpath to have >> tests-framework.jar before lucene-core.jar"); >> >> Basically, it is using a configuration file to order the classes that are >> loaded so the test mocks take priority over the built-in codecs. >> >> Just fixing the test could be done by making the static NamedSPILoader >> variable in the Codec class internal and swapping in a test double. >> However, that doesn't solve the bigger issue that Lucene.Net is missing its >> extensibility for anyone who wants to write their own codec (or tap into >> one of the other extensibility points). I guess the bigger question is how >> important will it be for anyone to extend Lucene codecs or inject >> dependencies into Analyzer factories? There doesn’t appear to be any more >> extensibility than that in Lucene 4.8.0, but that could change in more >> recent or future versions of Lucene. >> >> >> CI Builds >> ----------- >> >> Not working. Can someone look into that please? >> >> >> Thanks, >> Shad Storhaug (NightOwl888) >> >> >> >> -----Original Message----- >> From: Shad Storhaug >> Sent: Wednesday, October 5, 2016 8:23 PM >> To: dev@lucenenet.apache.org >> Cc: Connie Yau; 'cri...@gmail.com' >> Subject: RE: Remaining Work/Priorities >> >> > Analysis.ICU (Depends on ICU4j) hopefully we can remove the ICU DLLs >> from the analysis.commons module? >> >> Just for clarification, these are two entirely different things in Java. >> Analysis.Common (Analysis.Collator and Analysis.Th) depends on parts of >> Java: >> >> import java.text.BreakIterator; >> import java.text.Collator; >> import java.text.ParseException; >> import java.text.RuleBasedCollator; >> >> Highlighter.PostingsHighlighter and Highlighter.VectorHighlight also >> depend on parts of Java: >> >> import java.text.BreakIterator; >> import java.text.CharacterIterator; >> >> Analysis.ICU depends on a separate (icu4j) package: >> >> import com.ibm.icu.text.Normalizer; >> import com.ibm.icu.text.Normalizer2; >> import com.ibm.icu.text.Transliterator; >> import com.ibm.icu.text.Replaceable; >> import com.ibm.icu.text.Transliterator; >> import com.ibm.icu.text.UTF16; >> import com.ibm.icu.text.UnicodeSet; >> import com.ibm.icu.text.FilteredNormalizer2; >> import com.ibm.icu.text.Collator; >> import com.ibm.icu.text.RuleBasedCollator; >> import com.ibm.icu.util.ULocale; >> import com.ibm.icu.text.RawCollationKey; >> >> That said, icu4j DOES have Collator and RuleBasedCollator classes, but it >> DOES NOT have a BreakIterator or CharacterIterator class. It is unclear >> whether the Collator from icu4j would work as a replacement for the one in >> core Java. >> >> When I was digging through the JDK code, I noticed that BreakIterator and >> RuleBasedCollator have a lot of common ICU dependencies there, so even if >> the RuleBasedCollator from icu4j is compatible, it might make sense for us >> to port the one from Java anyway so we are dealing with the same shared >> dependencies in Analysis.Common. >> >> Once we port over the classes from the Java JDK, we will be able to >> eliminate our current ICU4NET dependency (and the platform issues that come >> with it). That said, porting over those pieces could take considerable >> work. In the interim it might make sense to make separate projects/NuGet >> packages to isolate the areas that depend on BreakIterator, >> CharacterIterator, and RuleBasedCollator so the rest can be released for >> wide/cross-platform use. Perhaps we can even make a basic (scaled down) >> BreakIterator for Highlighter that breaks on spaces between words and >> punctuation between sentences, which wouldn't work for Thai, but would work >> for most other languages. >> >> Porting the (icu4j) package is another complete ball of yarn, we should >> take a look at (https://github.com/sillsdev/icu-dotnet) to see if there >> is enough overlap there to power Analysis.ICU (offhand it looks as though >> some classes are missing, though). It is a wrapper around the C library - >> it may be that we just need to port more of it to get all of the pieces we >> need. >> >> Speaking of Collation, @ChristopherHaws have you made any more progress >> on Analysis.Collation? Were you able to determine if icu-dotnet's collator >> will make the tests pass? >> >> > I'm on it QueryParser.Flexible >> >> Great. The TimeZone probably just needs more research to work out how to >> utilize (in order to implement the failing test). Also, FYI MSDN's >> recommendation (https://msdn.microsoft.com/en >> -us/library/system.timezone(v=vs.110).aspx) is to use TimeZoneInfo >> rather than TimeZone (I noticed that several of the tests were recently >> modified to use TimeZone rather than TimeZoneInfo). >> >> As for the culture, in .NET I am pretty sure that we need to pass it as a >> parameter to another overload of `QueryParser.Parse` rather than making it >> a property of QueryParser. But we can deal with that in one step after you >> have finished porting. >> >> -- >> >> Shad Storhaug (NightOwl888) >> >> -----Original Message----- >> From: itamar.synhers...@gmail.com [mailto:itamar.synhers...@gmail.com] >> On Behalf Of Itamar Syn-Hershko >> Sent: Wednesday, October 5, 2016 5:28 AM >> To: dev@lucenenet.apache.org >> Cc: Connie Yau >> Subject: Re: Remaining Work/Priorities >> >> Awesome, thanks for all the hard work Shad! >> >> Our first priority should be fixing all remaining tests - in particular >> the one in Core. We should be ready to release and stamp our builds as 100% >> stable. As you mentioned, this could be an infrastructure issue - hopefully >> *Connie *can give a status update on her effort on the switch to xUnit? >> >> With regards to Modules, here's an updated breakdown based on your email >> + forgotten pieces + my comments: >> >> *Ported:* >> Lucene.Net (Core) - 15 failing / 1989 total Lucene.Net.Analysis.Common - >> 0 failing / 1445 total Lucene.Net.Classification - 0 failing / 9 total >> Lucene.Net.Expressions - 0 failing / 94 total Lucene.Net.Facet - (including >> #188 will be) 0 failing / 152 total Lucene.Net.Join - 0 failing / 27 total >> Lucene.Net.Memory - 0 failing / 10 total Lucene.Net.Misc - 2 failing / 42 >> total Lucene.Net.Queries - 2 failing / 96 total Lucene.Net.QueryParser - 1 >> failing / 203 total Lucene.Net.Suggest - 0 failing / 142 total >> >> We should do a second pass on the pieces we marked as ported, just to >> make sure the port is full and we didn't leave anything behind :) >> >> *Need to be ported:* >> Highlighter (Depends on Collator (which is still being ported) and >> BreakIterator (which we don't have a solution that works in .NET core yet)) >> Spatial (has 3rd party libraries that need to be updates) Spatial4n ( >> https://github.com/synhershko/Spatial4N) needs to be brought up to speed >> with spatial4j, dependencies of which may cause some issues.... >> Codecs >> Partially ported, mostly the tests weren't ported Grouping Not urgent, >> but provides nice functionality that users will probably like >> >> The only part with dependencies seems to be the spatial module - I will >> have a look there soon if you don't get to that before I do. >> >> *Can wait* - some modules are less frequently used, we should stabilize >> and release first and then work on them based on demand Analysis.ICU >> (Depends on ICU4j) hopefully we can remove the ICU DLLs from the >> analysis.commons module? I keep getting reports on some issues they are >> causing Analysis.Kuromoji Analysis.Morfologik (Depends on Morfologik) >> Analysis.Phonetic (Depends on Apache Commons) Apache commons is mostly >> helper libraries, so there's probably not real dependency just lots of >> replacement Analysis.SmartCN Analysis.Stempel (currently in progress) >> Analysis.UIMA (Depends on Tagger, uimaj-core, WhiteSpaceTokenizer) Demo >> while important because can help newbies, we can do better by providing >> docs and real world examples. I'm on it QueryParser.Flexible >> >> *No need to port* - neither are needed in our context Benchmark (many >> dependencies) Replicator (many dependencies) Sandbox (Depends on Apache >> Jakarta) >> >> Once all modules are ported and all tests are passing, I think we should >> get two more items fixed before an official release: >> >> 1. .NET Core support - I'm not clear on the status of it at the moment. >> We probably want to have it in for the release. >> >> 2. Public API Inconsistencies. We can discuss what should be done and >> what not when we get to that stage. Some are an obvious "fixme" but some >> will break code compatibility with Java I think we should avoid. >> >> One last note - *Wyatt*, do we know why there are no CI builds lately? >> >> -- >> >> Itamar Syn-Hershko >> http://code972.com | @synhershko <https://twitter.com/synhershko> >> Freelance Developer & Consultant Lucene.NET committer and PMC member >> >> On Sun, Oct 2, 2016 at 10:01 PM, Shad Storhaug <s...@shadstorhaug.com> >> wrote: >> >> > Hello, >> > >> > I just wanted to open this discussion to talk about the work remaining >> > to be done on Lucene.Net version 4.8.0. We are nearly there, but that >> > doesn't mean we don't still need help! >> > >> > >> > FAILING TESTS >> > ------------------- >> > >> > We now have over 5000 passing tests and as soon as pull request #188 ( >> > https://github.com/apache/lucenenet/pull/188) is merged, by my count >> > we have only 20 (actual) failing tests. Here is the breakdown by >> project: >> > >> > Lucene.Net (Core) - 15 failing / 1989 total Lucene.Net.Analysis.Common >> > - 0 failing / 1445 total Lucene.Net.Classification - 0 failing / 9 >> > total Lucene.Net.Expressions - 0 failing / 94 total Lucene.Net.Facet - >> > (including #188 will be) 0 failing / 152 total Lucene.Net.Join - 0 >> > failing / 27 total Lucene.Net.Memory - 0 failing / 10 total >> > Lucene.Net.Misc - 2 failing / 42 total Lucene.Net.Queries - 2 failing >> > / 96 total Lucene.Net.QueryParser - 1 failing / 203 total >> > Lucene.Net.Suggest - 0 failing / 142 total >> > >> > The reason why I said ACTUAL tests above is because I recently >> > discovered that many of the "failures" that are being reported are >> > false negatives (in fact, the VS2015 NUnit test runner shows there are >> > 135 failing tests total and 902 tests total that don't belong to any >> > project). Most NUnit 2.6 test runners do not correctly run tests in >> > shared abstract classes with the correct context (test setup) to make >> > them pass. These out-of-context runs add several additional minutes to >> the test run. >> > >> > As an experiment, I upgraded to NUnit 3.4.1 and it helped the >> > situation somewhat - that is, it ran the tests in the correct context >> > and I was able to determine that we have more tests than the numbers >> > above and they are all succeeding. However, it also ran the tests in >> > an invalid context (that is, the context of the abstract class without >> > any setup) and some of them still showed as failures. >> > >> > I know @conniey is currently working on porting the tests over to xUnit. >> > Hopefully, swapping test frameworks alone (or using some of the new >> > fancy test attributes) is enough to fix this issue. If not, we need to >> > find another solution - preferably one that can be applied to all of >> > the tests in abstract classes without too much effort or changing them >> > so they are too different from their Java counterpart. >> > >> > Remaining Pieces to Port >> > --------------------------------- >> > >> > I took an inventory of the remaining pieces left to port a few days >> > ago and here is what that looks like (alphabetical order): >> > >> > 1. Analysis.ICU (Depends on ICU4j) >> > 2. Analysis.Kuromoji >> > 3. Analysis.Morfologik (Depends on Morfologik) 4. Analysis.Phonetic >> > (Depends on Apache Commons) 5. Analysis.SmartCN 6. Analysis.Stempel >> > (currently in progress) 7. Analysis.UIMA (Depends on Tagger, >> > uimaj-core, WhiteSpaceTokenizer) 8. Benchmark (many dependencies) 9. >> > Demo 10. Highlighter (Depends on Collator (which is still being >> > ported) and BreakIterator (which we don't have a solution that works >> > in .NET core yet)) 11. Replicator (many dependencies) 12. Sandbox >> > (Depends on Apache Jakarta) 13. Spatial (Already ported in #174 >> > (https://github.com/apache/ lucenenet/pull/174), needs a recent >> > version of spatial4n) 14. QueryParser.Flexible >> > >> > Itamar, it would be helpful if you would be so kind as to organize >> > this list in terms of priority. It also couldn't hurt to update the >> > contributing documents >> > (https://github.com/apache/lucenenet/blob/master/CONTRIBUTING.md, >> > and >> > https://cwiki.apache.org/confluence/display/LUCENENET/Current+Status >> > with the latest information so anyone who wants to help out knows the >> > current status. >> > >> > Of course, it is the known status of dependencies that we need >> > clarification on. Which of these dependencies is known to be ported? >> > Which of them are ported but are not up to date? Which of them are >> > known not to be ported, and which of them are unknown? >> > >> > >> > Public API Inconsistencies >> > --------------------------------- >> > >> > One thing that I have had my eye on for a while now is the >> > .NETification/consistency of the core API (that is, in the Lucene.Net >> > project). There are several issues that I would like to address >> including: >> > >> > >> > 1. Method names that are still camelCase >> > >> > 2. Properties that should be methods (because they do a lot of >> > processing or because they are non-deterministic) >> > >> > 3. Methods that should be properties >> > >> > 4. .Size() vs .Size vs .Count - should generally all be .Count in >> > .NET >> > >> > 5. Interfaces should begin with "I" >> > >> > 6. Classes should not begin with "I" followed by another capital >> > letter (for some reason some of them were named that way) >> > >> > 7. .CharAt() should probably be this[] >> > >> > 8. Generic types nested within generic types (which cause Visual >> > Studio to crash when Intellisense tries to read them) >> > >> > ... and so on. The only thing is these are all sweeping changes that >> > will affect everyone helping out on Lucene.Net and anyone who is >> > currently using the beta. So, I just wanted to gather some input on >> > when the most appropriate time to begin working on these sweeping >> changes would be? >> > >> > >> > Thanks, >> > Shad Storhaug (NightOwl888) >> > >> > >> > >> > >> > >> > >> > >> > >