Re: Remaining Work/Priorities

Itamar Syn-Hershko Wed, 12 Oct 2016 10:11:20 -0700

While on that note, Shad - emails to you bounce with the following error
(still):


Delivery to the following recipient failed permanently:

     s...@shadstorhaug.com

Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the server for
the recipient domain shadstorhaug.com by mx1.hostmailserver.com.
[69.160.246.214].

The error that the other server returned was:
554 5.7.1 gmail.com is blacklisted.

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Wed, Oct 12, 2016 at 8:06 PM, Itamar Syn-Hershko <ita...@code972.com>
wrote:

> CI failure seems to be worked on: https://twitter.com/codebetterCI/status/
> 785854074713468932 (Thanks Wyatt for pointing that out)
>
> I will look into the rest in a little while
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Lucene.NET committer and PMC member
>
> On Tue, Oct 11, 2016 at 10:10 PM, Shad Storhaug <s...@shadstorhaug.com>
> wrote:
>
>> Update
>> ======
>>
>> I have just pushed some commits that fix several bugs in the
>> Lucene.Net.Codecs project (all 452 tests pass most of the time, a few
>> random failures) and fix all but 4 of the failing tests in Lucene.Net.Core.
>>
>>
>> Fix for Test Context
>> -------------------------
>>
>> For now, I have added method override stubs to each subclass in order to
>> add the [Test] attribute, so NUnit will run them in the correct context. I
>> did that on all of the superclass tests except for the ones in QueryParser
>> (since Itamar mentioned he would be working in that area). Itamar, you will
>> probably need to follow suit to get all of the QP tests to pass - namely
>> with the QueryParserTestBase and TestQueryParser classes.
>>
>> I have carefully put all of these changes into a single commit so it can
>> be reverted easily, if this solution doesn't happen to be compatible with
>> xUnit: https://github.com/apache/lucenenet/commit/2a79edea6359e1ee1
>> f83269cc7dc3ef2753ebf2c. Hopefully that makes life easier for @conniey.
>>
>> @Itamar, let me know when this is completed on your end so I can do a
>> double revert and squash the test stubs from QueryParser into an
>> all-inclusive revert-able commit.
>>
>> We can now correctly see how many tests we have in the core. Currently
>> there are 2730 - it seems we are still missing 720 tests, assuming they all
>> were for something port-able.
>>
>>
>> Remaining Tests
>> ---------------------
>>
>> Next I plan to work on locating any tests that we have missed (starting
>> in the core). It seems these fall into several categories:
>>
>> 1. Tests that have not yet been ported.
>> 2. Tests that have been partially ported that have not been added to the
>> project.
>> 3. Tests that have been ported, but are missing the [Test] attribute.
>> 4. Tests in classes that have been ported that have been commented out
>> (presumably because at the time they were ported the dependencies did not
>> yet exist).
>> 5. Tests that have been Ignored in .NET that were not in Java.
>> 6. Tests that have NUnit Assume.That() logic that depends on some
>> non-existant JRE condition, so they are not running in .NET.
>>
>> I'll make a quick effort to get them to pass, but the main goal will be
>> to ensure they all can run and are included in the project. Just a heads up
>> that the number of test failures is likely to increase on this pass (but
>> the number of bugs will likely decrease).
>>
>>
>> Failing Core Tests
>> -----------------------
>>
>> I have looked into the remaining tests somewhat. There are 2 issues that
>> I need some input on to solve.
>>
>>
>> TestRamUsageEstimator.TestSanity()
>>
>> Java Lucene uses a JRE-specific API to determine how much header size to
>> add on each field. This makes the estimates higher in Java. But more
>> importantly, this test is failing because the estimate for a real string
>> instance is coming back as the same size as its shallow size (16 bytes in
>> this case) - it needs to be at least 1 byte more than that for the test to
>> pass. In Java (at least in a 64 bit environment), there are an extra 4
>> bytes being added for each field.
>>
>> Technically, there is a way to get these numbers from .NET, but it
>> involves calling undocumented APIs using pointers and will likely be
>> different from one .NET version to the next (a bad idea for a project that
>> needs to support multiple .NET versions). The only solution I can think of
>> is to hard code in an extra 4 bytes for 64 bit (and most likely 2 bytes for
>> 32 bit) in order to make the numbers for the instances larger than their
>> shallow size. I suppose the alternative would be to either comment out the
>> string test or change it to >= make it pass. Thoughts? Alternatives?
>>
>>
>> TestNumericDocValuesUpdates.TestUpdateOldSegments()
>>
>> I discovered what the issue is here (normally that is the hard part), but
>> it seems that the proper solution is going to be a major task. The
>> NamedSPILoader (backed by SPIClassIterator) in Java Lucene is used as a
>> service locator to load classes throughout the project. In the Codec
>> abstract class, it is used to load up the codec for the context it is used
>> in. However, our port of the NamedSPILoader simply loads all of the classes
>> from the current AppDomain without any way to order them or override them.
>>
>> The problem is that in Lucene, this was meant to be an extension point.
>> And this particular test (and probably many more of them) uses that
>> extension point to change the codec to a Mock from the test framework. This
>> line from TestRuleSetupAndRestoreClassEnv pretty much sums up what the
>> issue is:
>>
>> > Debug.Assert(Codec is Lucene42RWCodec, "fix your classpath to have
>> tests-framework.jar before lucene-core.jar");
>>
>> Basically, it is using a configuration file to order the classes that are
>> loaded so the test mocks take priority over the built-in codecs.
>>
>> Just fixing the test could be done by making the static NamedSPILoader
>> variable in the Codec class internal and swapping in a test double.
>> However, that doesn't solve the bigger issue that Lucene.Net is missing its
>> extensibility for anyone who wants to write their own codec (or tap into
>> one of the other extensibility points). I guess the bigger question is how
>> important will it be for anyone to extend Lucene codecs or inject
>> dependencies into Analyzer factories? There doesn’t appear to be any more
>> extensibility than that in Lucene 4.8.0, but that could change in more
>> recent or future versions of Lucene.
>>
>>
>> CI Builds
>> -----------
>>
>> Not working. Can someone look into that please?
>>
>>
>> Thanks,
>> Shad Storhaug (NightOwl888)
>>
>>
>>
>> -----Original Message-----
>> From: Shad Storhaug
>> Sent: Wednesday, October 5, 2016 8:23 PM
>> To: dev@lucenenet.apache.org
>> Cc: Connie Yau; 'cri...@gmail.com'
>> Subject: RE: Remaining Work/Priorities
>>
>> > Analysis.ICU (Depends on ICU4j) hopefully we can remove the ICU DLLs
>> from the analysis.commons module?
>>
>> Just for clarification, these are two entirely different things in Java.
>> Analysis.Common (Analysis.Collator and Analysis.Th) depends on parts of
>> Java:
>>
>> import java.text.BreakIterator;
>> import java.text.Collator;
>> import java.text.ParseException;
>> import java.text.RuleBasedCollator;
>>
>> Highlighter.PostingsHighlighter and Highlighter.VectorHighlight also
>> depend on parts of Java:
>>
>> import java.text.BreakIterator;
>> import java.text.CharacterIterator;
>>
>> Analysis.ICU depends on a separate (icu4j) package:
>>
>> import com.ibm.icu.text.Normalizer;
>> import com.ibm.icu.text.Normalizer2;
>> import com.ibm.icu.text.Transliterator;
>> import com.ibm.icu.text.Replaceable;
>> import com.ibm.icu.text.Transliterator;
>> import com.ibm.icu.text.UTF16;
>> import com.ibm.icu.text.UnicodeSet;
>> import com.ibm.icu.text.FilteredNormalizer2;
>> import com.ibm.icu.text.Collator;
>> import com.ibm.icu.text.RuleBasedCollator;
>> import com.ibm.icu.util.ULocale;
>> import com.ibm.icu.text.RawCollationKey;
>>
>> That said, icu4j DOES have Collator and RuleBasedCollator classes, but it
>> DOES NOT have a BreakIterator or CharacterIterator class. It is unclear
>> whether the Collator from icu4j would work as a replacement for the one in
>> core Java.
>>
>> When I was digging through the JDK code, I noticed that BreakIterator and
>> RuleBasedCollator have a lot of common ICU dependencies there, so even if
>> the RuleBasedCollator from icu4j is compatible, it might make sense for us
>> to port the one from Java anyway so we are dealing with the same shared
>> dependencies in Analysis.Common.
>>
>> Once we port over the classes from the Java JDK, we will be able to
>> eliminate our current ICU4NET dependency (and the platform issues that come
>> with it). That said, porting over those pieces could take considerable
>> work. In the interim it might make sense to make separate projects/NuGet
>> packages to isolate the areas that depend on BreakIterator,
>> CharacterIterator, and RuleBasedCollator so the rest can be released for
>> wide/cross-platform use. Perhaps we can even make a basic (scaled down)
>> BreakIterator for Highlighter that breaks on spaces between words and
>> punctuation between sentences, which wouldn't work for Thai, but would work
>> for most other languages.
>>
>> Porting the (icu4j) package is another complete ball of yarn, we should
>> take a look at (https://github.com/sillsdev/icu-dotnet) to see if there
>> is enough overlap there to power Analysis.ICU (offhand it looks as though
>> some classes are missing, though). It is a wrapper around the C library -
>> it may be that we just need to port more of it to get all of the pieces we
>> need.
>>
>> Speaking of Collation, @ChristopherHaws have you made any more progress
>> on Analysis.Collation? Were you able to determine if icu-dotnet's collator
>> will make the tests pass?
>>
>> > I'm on it QueryParser.Flexible
>>
>> Great. The TimeZone probably just needs more research to work out how to
>> utilize (in order to implement the failing test). Also, FYI MSDN's
>> recommendation (https://msdn.microsoft.com/en
>> -us/library/system.timezone(v=vs.110).aspx) is to use TimeZoneInfo
>> rather than TimeZone (I noticed that several of the tests were recently
>> modified to use TimeZone rather than TimeZoneInfo).
>>
>> As for the culture, in .NET I am pretty sure that we need to pass it as a
>> parameter to another overload of `QueryParser.Parse` rather than making it
>> a property of QueryParser. But we can deal with that in one step after you
>> have finished porting.
>>
>> --
>>
>> Shad Storhaug (NightOwl888)
>>
>> -----Original Message-----
>> From: itamar.synhers...@gmail.com [mailto:itamar.synhers...@gmail.com]
>> On Behalf Of Itamar Syn-Hershko
>> Sent: Wednesday, October 5, 2016 5:28 AM
>> To: dev@lucenenet.apache.org
>> Cc: Connie Yau
>> Subject: Re: Remaining Work/Priorities
>>
>> Awesome, thanks for all the hard work Shad!
>>
>> Our first priority should be fixing all remaining tests - in particular
>> the one in Core. We should be ready to release and stamp our builds as 100%
>> stable. As you mentioned, this could be an infrastructure issue - hopefully
>> *Connie *can give a status update on her effort on the switch to xUnit?
>>
>> With regards to Modules, here's an updated breakdown based on your email
>> + forgotten pieces + my comments:
>>
>> *Ported:*
>> Lucene.Net (Core) - 15 failing / 1989 total Lucene.Net.Analysis.Common -
>> 0 failing / 1445 total Lucene.Net.Classification - 0 failing / 9 total
>> Lucene.Net.Expressions - 0 failing / 94 total Lucene.Net.Facet - (including
>> #188 will be) 0 failing / 152 total Lucene.Net.Join - 0 failing / 27 total
>> Lucene.Net.Memory - 0 failing / 10 total Lucene.Net.Misc - 2 failing / 42
>> total Lucene.Net.Queries - 2 failing / 96 total Lucene.Net.QueryParser - 1
>> failing / 203 total Lucene.Net.Suggest - 0 failing / 142 total
>>
>> We should do a second pass on the pieces we marked as ported, just to
>> make sure the port is full and we didn't leave anything behind :)
>>
>> *Need to be ported:*
>> Highlighter (Depends on Collator (which is still being ported) and
>> BreakIterator (which we don't have a solution that works in .NET core yet))
>> Spatial (has 3rd party libraries that need to be updates) Spatial4n (
>> https://github.com/synhershko/Spatial4N) needs to be brought up to speed
>> with spatial4j, dependencies of which may cause some issues....
>> Codecs
>> Partially ported, mostly the tests weren't ported Grouping Not urgent,
>> but provides nice functionality that users will probably like
>>
>> The only part with dependencies seems to be the spatial module - I will
>> have a look there soon if you don't get to that before I do.
>>
>> *Can wait* - some modules are less frequently used, we should stabilize
>> and release first and then work on them based on demand Analysis.ICU
>> (Depends on ICU4j) hopefully we can remove the ICU DLLs from the
>> analysis.commons module? I keep getting reports on some issues they are
>> causing Analysis.Kuromoji Analysis.Morfologik (Depends on Morfologik)
>> Analysis.Phonetic (Depends on Apache Commons) Apache commons is mostly
>> helper libraries, so there's probably not real dependency just lots of
>> replacement Analysis.SmartCN Analysis.Stempel (currently in progress)
>> Analysis.UIMA (Depends on Tagger, uimaj-core, WhiteSpaceTokenizer) Demo
>> while important because can help newbies, we can do better by providing
>> docs and real world examples. I'm on it QueryParser.Flexible
>>
>> *No need to port* - neither are needed in our context Benchmark (many
>> dependencies) Replicator (many dependencies) Sandbox (Depends on Apache
>> Jakarta)
>>
>> Once all modules are ported and all tests are passing, I think we should
>> get two more items fixed before an official release:
>>
>> 1. .NET Core support - I'm not clear on the status of it at the moment.
>> We probably want to have it in for the release.
>>
>> 2. Public API Inconsistencies. We can discuss what should be done and
>> what not when we get to that stage. Some are an obvious "fixme" but some
>> will break code compatibility with Java I think we should avoid.
>>
>> One last note - *Wyatt*, do we know why there are no CI builds lately?
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>> Freelance Developer & Consultant Lucene.NET committer and PMC member
>>
>> On Sun, Oct 2, 2016 at 10:01 PM, Shad Storhaug <s...@shadstorhaug.com>
>> wrote:
>>
>> > Hello,
>> >
>> > I just wanted to open this discussion to talk about the work remaining
>> > to be done on Lucene.Net version 4.8.0. We are nearly there, but that
>> > doesn't mean we don't still need help!
>> >
>> >
>> > FAILING TESTS
>> > -------------------
>> >
>> > We now have over 5000 passing tests and as soon as pull request #188 (
>> > https://github.com/apache/lucenenet/pull/188) is merged, by my count
>> > we have only 20 (actual) failing tests. Here is the breakdown by
>> project:
>> >
>> > Lucene.Net (Core) - 15 failing / 1989 total Lucene.Net.Analysis.Common
>> > - 0 failing / 1445 total Lucene.Net.Classification - 0 failing / 9
>> > total Lucene.Net.Expressions - 0 failing / 94 total Lucene.Net.Facet -
>> > (including #188 will be) 0 failing / 152 total Lucene.Net.Join - 0
>> > failing / 27 total Lucene.Net.Memory - 0 failing / 10 total
>> > Lucene.Net.Misc - 2 failing / 42 total Lucene.Net.Queries - 2 failing
>> > / 96 total Lucene.Net.QueryParser - 1 failing / 203 total
>> > Lucene.Net.Suggest - 0 failing / 142 total
>> >
>> > The reason why I said ACTUAL tests above is because I recently
>> > discovered that many of the "failures" that are being reported are
>> > false negatives (in fact, the VS2015 NUnit test runner shows there are
>> > 135 failing tests total and 902 tests total that don't belong to any
>> > project). Most NUnit 2.6 test runners do not correctly run tests in
>> > shared abstract classes with the correct context (test setup) to make
>> > them pass. These out-of-context runs add several additional minutes to
>> the test run.
>> >
>> > As an experiment, I upgraded to NUnit 3.4.1 and it helped the
>> > situation somewhat - that is, it ran the tests in the correct context
>> > and I was able to determine that we have more tests than the numbers
>> > above and they are all succeeding. However, it also ran the tests in
>> > an invalid context (that is, the context of the abstract class without
>> > any setup) and some of them still showed as failures.
>> >
>> > I know @conniey is currently working on porting the tests over to xUnit.
>> > Hopefully, swapping test frameworks alone (or using some of the new
>> > fancy test attributes) is enough to fix this issue. If not, we need to
>> > find another solution - preferably one that can be applied to all of
>> > the tests in abstract classes without too much effort or changing them
>> > so they are too different from their Java counterpart.
>> >
>> > Remaining Pieces to Port
>> > ---------------------------------
>> >
>> > I took an inventory of the remaining pieces left to port a few days
>> > ago and here is what that looks like (alphabetical order):
>> >
>> > 1. Analysis.ICU (Depends on ICU4j)
>> > 2. Analysis.Kuromoji
>> > 3. Analysis.Morfologik (Depends on Morfologik) 4. Analysis.Phonetic
>> > (Depends on Apache Commons) 5. Analysis.SmartCN 6. Analysis.Stempel
>> > (currently in progress) 7. Analysis.UIMA (Depends on Tagger,
>> > uimaj-core, WhiteSpaceTokenizer) 8. Benchmark (many dependencies) 9.
>> > Demo 10. Highlighter (Depends on Collator (which is still being
>> > ported) and BreakIterator (which we don't have a solution that works
>> > in .NET core yet)) 11. Replicator (many dependencies) 12. Sandbox
>> > (Depends on Apache Jakarta) 13. Spatial (Already ported in #174
>> > (https://github.com/apache/ lucenenet/pull/174), needs a recent
>> > version of spatial4n) 14. QueryParser.Flexible
>> >
>> > Itamar, it would be helpful if you would be so kind as to organize
>> > this list in terms of priority. It also couldn't hurt to update the
>> > contributing documents
>> > (https://github.com/apache/lucenenet/blob/master/CONTRIBUTING.md,
>> > and
>> > https://cwiki.apache.org/confluence/display/LUCENENET/Current+Status
>> > with the latest information so anyone who wants to help out knows the
>> > current status.
>> >
>> > Of course, it is the known status of dependencies that we need
>> > clarification on. Which of these dependencies is known to be ported?
>> > Which of them are ported but are not up to date? Which of them are
>> > known not to be ported, and which of them are unknown?
>> >
>> >
>> > Public API Inconsistencies
>> > ---------------------------------
>> >
>> > One thing that I have had my eye on for a while now is the
>> > .NETification/consistency of the core API (that is, in the Lucene.Net
>> > project). There are several issues that I would like to address
>> including:
>> >
>> >
>> > 1.       Method names that are still camelCase
>> >
>> > 2.       Properties that should be methods (because they do a lot of
>> > processing or because they are non-deterministic)
>> >
>> > 3.       Methods that should be properties
>> >
>> > 4.       .Size() vs .Size vs .Count - should generally all be .Count in
>> > .NET
>> >
>> > 5.       Interfaces should begin with "I"
>> >
>> > 6.       Classes should not begin with "I" followed by another capital
>> > letter (for some reason some of them were named that way)
>> >
>> > 7.       .CharAt() should probably be this[]
>> >
>> > 8.       Generic types nested within generic types (which cause Visual
>> > Studio to crash when Intellisense tries to read them)
>> >
>> > ... and so on. The only thing is these are all sweeping changes that
>> > will affect everyone helping out on Lucene.Net and anyone who is
>> > currently using the beta. So, I just wanted to gather some input on
>> > when the most appropriate time to begin working on these sweeping
>> changes would be?
>> >
>> >
>> > Thanks,
>> > Shad Storhaug (NightOwl888)
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>

Re: Remaining Work/Priorities

Reply via email to