RE: Remaining Work/Priorities

Shad Storhaug Wed, 12 Oct 2016 18:36:54 -0700

Itamar,

I contacted my ISP and I think I now have it straightened out. Could you send 
me a test email to verify?



I just found a bug that is due to the "Old format impersonation is active" 
setting being made non-static. The class is loaded by the SPIClassIterator and 
in that class the setting is hard coded in the default constructor to true 
(even though in this case it is false in LuceneTestCase). Can I pick your brain 
to understand what the reasoning is for changing this to an instance variable? 
In this case we have a global setting combined with constrained construction so 
the only reasonable way for the class to read it is to make it static.

Thanks,
Shad Storhaug (NightOwl888)

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Itamar Syn-Hershko
Sent: Thursday, October 13, 2016 12:09 AM
To: [email protected]
Subject: Re: Remaining Work/Priorities

While on that note, Shad - emails to you bounce with the following error
(still):

Delivery to the following recipient failed permanently:

     [email protected]

Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the server for the 
recipient domain shadstorhaug.com by mx1.hostmailserver.com.
[69.160.246.214].

The error that the other server returned was:
554 5.7.1 gmail.com is blacklisted.

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance 
Developer & Consultant Lucene.NET committer and PMC member

On Wed, Oct 12, 2016 at 8:06 PM, Itamar Syn-Hershko <[email protected]>
wrote:

> CI failure seems to be worked on: 
> https://twitter.com/codebetterCI/status/
> 785854074713468932 (Thanks Wyatt for pointing that out)
>
> I will look into the rest in a little while
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko> 
> Freelance Developer & Consultant Lucene.NET committer and PMC member
>
> On Tue, Oct 11, 2016 at 10:10 PM, Shad Storhaug 
> <[email protected]>
> wrote:
>
>> Update
>> ======
>>
>> I have just pushed some commits that fix several bugs in the 
>> Lucene.Net.Codecs project (all 452 tests pass most of the time, a few 
>> random failures) and fix all but 4 of the failing tests in Lucene.Net.Core.
>>
>>
>> Fix for Test Context
>> -------------------------
>>
>> For now, I have added method override stubs to each subclass in order 
>> to add the [Test] attribute, so NUnit will run them in the correct 
>> context. I did that on all of the superclass tests except for the 
>> ones in QueryParser (since Itamar mentioned he would be working in 
>> that area). Itamar, you will probably need to follow suit to get all 
>> of the QP tests to pass - namely with the QueryParserTestBase and 
>> TestQueryParser classes.
>>
>> I have carefully put all of these changes into a single commit so it 
>> can be reverted easily, if this solution doesn't happen to be 
>> compatible with
>> xUnit: https://github.com/apache/lucenenet/commit/2a79edea6359e1ee1
>> f83269cc7dc3ef2753ebf2c. Hopefully that makes life easier for @conniey.
>>
>> @Itamar, let me know when this is completed on your end so I can do a 
>> double revert and squash the test stubs from QueryParser into an 
>> all-inclusive revert-able commit.
>>
>> We can now correctly see how many tests we have in the core. 
>> Currently there are 2730 - it seems we are still missing 720 tests, 
>> assuming they all were for something port-able.
>>
>>
>> Remaining Tests
>> ---------------------
>>
>> Next I plan to work on locating any tests that we have missed 
>> (starting in the core). It seems these fall into several categories:
>>
>> 1. Tests that have not yet been ported.
>> 2. Tests that have been partially ported that have not been added to 
>> the project.
>> 3. Tests that have been ported, but are missing the [Test] attribute.
>> 4. Tests in classes that have been ported that have been commented 
>> out (presumably because at the time they were ported the dependencies 
>> did not yet exist).
>> 5. Tests that have been Ignored in .NET that were not in Java.
>> 6. Tests that have NUnit Assume.That() logic that depends on some 
>> non-existant JRE condition, so they are not running in .NET.
>>
>> I'll make a quick effort to get them to pass, but the main goal will 
>> be to ensure they all can run and are included in the project. Just a 
>> heads up that the number of test failures is likely to increase on 
>> this pass (but the number of bugs will likely decrease).
>>
>>
>> Failing Core Tests
>> -----------------------
>>
>> I have looked into the remaining tests somewhat. There are 2 issues 
>> that I need some input on to solve.
>>
>>
>> TestRamUsageEstimator.TestSanity()
>>
>> Java Lucene uses a JRE-specific API to determine how much header size 
>> to add on each field. This makes the estimates higher in Java. But 
>> more importantly, this test is failing because the estimate for a 
>> real string instance is coming back as the same size as its shallow 
>> size (16 bytes in this case) - it needs to be at least 1 byte more 
>> than that for the test to pass. In Java (at least in a 64 bit 
>> environment), there are an extra 4 bytes being added for each field.
>>
>> Technically, there is a way to get these numbers from .NET, but it 
>> involves calling undocumented APIs using pointers and will likely be 
>> different from one .NET version to the next (a bad idea for a project 
>> that needs to support multiple .NET versions). The only solution I 
>> can think of is to hard code in an extra 4 bytes for 64 bit (and most 
>> likely 2 bytes for
>> 32 bit) in order to make the numbers for the instances larger than 
>> their shallow size. I suppose the alternative would be to either 
>> comment out the string test or change it to >= make it pass. Thoughts? 
>> Alternatives?
>>
>>
>> TestNumericDocValuesUpdates.TestUpdateOldSegments()
>>
>> I discovered what the issue is here (normally that is the hard part), 
>> but it seems that the proper solution is going to be a major task. 
>> The NamedSPILoader (backed by SPIClassIterator) in Java Lucene is 
>> used as a service locator to load classes throughout the project. In 
>> the Codec abstract class, it is used to load up the codec for the 
>> context it is used in. However, our port of the NamedSPILoader simply 
>> loads all of the classes from the current AppDomain without any way to order 
>> them or override them.
>>
>> The problem is that in Lucene, this was meant to be an extension point.
>> And this particular test (and probably many more of them) uses that 
>> extension point to change the codec to a Mock from the test 
>> framework. This line from TestRuleSetupAndRestoreClassEnv pretty much 
>> sums up what the issue is:
>>
>> > Debug.Assert(Codec is Lucene42RWCodec, "fix your classpath to have
>> tests-framework.jar before lucene-core.jar");
>>
>> Basically, it is using a configuration file to order the classes that 
>> are loaded so the test mocks take priority over the built-in codecs.
>>
>> Just fixing the test could be done by making the static 
>> NamedSPILoader variable in the Codec class internal and swapping in a test 
>> double.
>> However, that doesn't solve the bigger issue that Lucene.Net is 
>> missing its extensibility for anyone who wants to write their own 
>> codec (or tap into one of the other extensibility points). I guess 
>> the bigger question is how important will it be for anyone to extend 
>> Lucene codecs or inject dependencies into Analyzer factories? There 
>> doesn’t appear to be any more extensibility than that in Lucene 
>> 4.8.0, but that could change in more recent or future versions of Lucene.
>>
>>
>> CI Builds
>> -----------
>>
>> Not working. Can someone look into that please?
>>
>>
>> Thanks,
>> Shad Storhaug (NightOwl888)
>>
>>
>>
>> -----Original Message-----
>> From: Shad Storhaug
>> Sent: Wednesday, October 5, 2016 8:23 PM
>> To: [email protected]
>> Cc: Connie Yau; '[email protected]'
>> Subject: RE: Remaining Work/Priorities
>>
>> > Analysis.ICU (Depends on ICU4j) hopefully we can remove the ICU 
>> > DLLs
>> from the analysis.commons module?
>>
>> Just for clarification, these are two entirely different things in Java.
>> Analysis.Common (Analysis.Collator and Analysis.Th) depends on parts 
>> of
>> Java:
>>
>> import java.text.BreakIterator;
>> import java.text.Collator;
>> import java.text.ParseException;
>> import java.text.RuleBasedCollator;
>>
>> Highlighter.PostingsHighlighter and Highlighter.VectorHighlight also 
>> depend on parts of Java:
>>
>> import java.text.BreakIterator;
>> import java.text.CharacterIterator;
>>
>> Analysis.ICU depends on a separate (icu4j) package:
>>
>> import com.ibm.icu.text.Normalizer;
>> import com.ibm.icu.text.Normalizer2;
>> import com.ibm.icu.text.Transliterator; import 
>> com.ibm.icu.text.Replaceable; import com.ibm.icu.text.Transliterator; 
>> import com.ibm.icu.text.UTF16; import com.ibm.icu.text.UnicodeSet; 
>> import com.ibm.icu.text.FilteredNormalizer2;
>> import com.ibm.icu.text.Collator;
>> import com.ibm.icu.text.RuleBasedCollator;
>> import com.ibm.icu.util.ULocale;
>> import com.ibm.icu.text.RawCollationKey;
>>
>> That said, icu4j DOES have Collator and RuleBasedCollator classes, 
>> but it DOES NOT have a BreakIterator or CharacterIterator class. It 
>> is unclear whether the Collator from icu4j would work as a 
>> replacement for the one in core Java.
>>
>> When I was digging through the JDK code, I noticed that BreakIterator 
>> and RuleBasedCollator have a lot of common ICU dependencies there, so 
>> even if the RuleBasedCollator from icu4j is compatible, it might make 
>> sense for us to port the one from Java anyway so we are dealing with 
>> the same shared dependencies in Analysis.Common.
>>
>> Once we port over the classes from the Java JDK, we will be able to 
>> eliminate our current ICU4NET dependency (and the platform issues 
>> that come with it). That said, porting over those pieces could take 
>> considerable work. In the interim it might make sense to make 
>> separate projects/NuGet packages to isolate the areas that depend on 
>> BreakIterator, CharacterIterator, and RuleBasedCollator so the rest 
>> can be released for wide/cross-platform use. Perhaps we can even make 
>> a basic (scaled down) BreakIterator for Highlighter that breaks on 
>> spaces between words and punctuation between sentences, which 
>> wouldn't work for Thai, but would work for most other languages.
>>
>> Porting the (icu4j) package is another complete ball of yarn, we 
>> should take a look at (https://github.com/sillsdev/icu-dotnet) to see 
>> if there is enough overlap there to power Analysis.ICU (offhand it 
>> looks as though some classes are missing, though). It is a wrapper 
>> around the C library - it may be that we just need to port more of it 
>> to get all of the pieces we need.
>>
>> Speaking of Collation, @ChristopherHaws have you made any more 
>> progress on Analysis.Collation? Were you able to determine if 
>> icu-dotnet's collator will make the tests pass?
>>
>> > I'm on it QueryParser.Flexible
>>
>> Great. The TimeZone probably just needs more research to work out how 
>> to utilize (in order to implement the failing test). Also, FYI MSDN's 
>> recommendation (https://msdn.microsoft.com/en
>> -us/library/system.timezone(v=vs.110).aspx) is to use TimeZoneInfo 
>> rather than TimeZone (I noticed that several of the tests were 
>> recently modified to use TimeZone rather than TimeZoneInfo).
>>
>> As for the culture, in .NET I am pretty sure that we need to pass it 
>> as a parameter to another overload of `QueryParser.Parse` rather than 
>> making it a property of QueryParser. But we can deal with that in one 
>> step after you have finished porting.
>>
>> --
>>
>> Shad Storhaug (NightOwl888)
>>
>> -----Original Message-----
>> From: [email protected] 
>> [mailto:[email protected]]
>> On Behalf Of Itamar Syn-Hershko
>> Sent: Wednesday, October 5, 2016 5:28 AM
>> To: [email protected]
>> Cc: Connie Yau
>> Subject: Re: Remaining Work/Priorities
>>
>> Awesome, thanks for all the hard work Shad!
>>
>> Our first priority should be fixing all remaining tests - in 
>> particular the one in Core. We should be ready to release and stamp 
>> our builds as 100% stable. As you mentioned, this could be an 
>> infrastructure issue - hopefully *Connie *can give a status update on her 
>> effort on the switch to xUnit?
>>
>> With regards to Modules, here's an updated breakdown based on your 
>> email
>> + forgotten pieces + my comments:
>>
>> *Ported:*
>> Lucene.Net (Core) - 15 failing / 1989 total 
>> Lucene.Net.Analysis.Common -
>> 0 failing / 1445 total Lucene.Net.Classification - 0 failing / 9 
>> total Lucene.Net.Expressions - 0 failing / 94 total Lucene.Net.Facet 
>> - (including
>> #188 will be) 0 failing / 152 total Lucene.Net.Join - 0 failing / 27 
>> total Lucene.Net.Memory - 0 failing / 10 total Lucene.Net.Misc - 2 
>> failing / 42 total Lucene.Net.Queries - 2 failing / 96 total 
>> Lucene.Net.QueryParser - 1 failing / 203 total Lucene.Net.Suggest - 0 
>> failing / 142 total
>>
>> We should do a second pass on the pieces we marked as ported, just to 
>> make sure the port is full and we didn't leave anything behind :)
>>
>> *Need to be ported:*
>> Highlighter (Depends on Collator (which is still being ported) and 
>> BreakIterator (which we don't have a solution that works in .NET core 
>> yet)) Spatial (has 3rd party libraries that need to be updates) 
>> Spatial4n (
>> https://github.com/synhershko/Spatial4N) needs to be brought up to 
>> speed with spatial4j, dependencies of which may cause some issues....
>> Codecs
>> Partially ported, mostly the tests weren't ported Grouping Not 
>> urgent, but provides nice functionality that users will probably like
>>
>> The only part with dependencies seems to be the spatial module - I 
>> will have a look there soon if you don't get to that before I do.
>>
>> *Can wait* - some modules are less frequently used, we should 
>> stabilize and release first and then work on them based on demand 
>> Analysis.ICU (Depends on ICU4j) hopefully we can remove the ICU DLLs 
>> from the analysis.commons module? I keep getting reports on some 
>> issues they are causing Analysis.Kuromoji Analysis.Morfologik 
>> (Depends on Morfologik) Analysis.Phonetic (Depends on Apache Commons) 
>> Apache commons is mostly helper libraries, so there's probably not 
>> real dependency just lots of replacement Analysis.SmartCN 
>> Analysis.Stempel (currently in progress) Analysis.UIMA (Depends on 
>> Tagger, uimaj-core, WhiteSpaceTokenizer) Demo while important because 
>> can help newbies, we can do better by providing docs and real world 
>> examples. I'm on it QueryParser.Flexible
>>
>> *No need to port* - neither are needed in our context Benchmark (many
>> dependencies) Replicator (many dependencies) Sandbox (Depends on 
>> Apache
>> Jakarta)
>>
>> Once all modules are ported and all tests are passing, I think we 
>> should get two more items fixed before an official release:
>>
>> 1. .NET Core support - I'm not clear on the status of it at the moment.
>> We probably want to have it in for the release.
>>
>> 2. Public API Inconsistencies. We can discuss what should be done and 
>> what not when we get to that stage. Some are an obvious "fixme" but 
>> some will break code compatibility with Java I think we should avoid.
>>
>> One last note - *Wyatt*, do we know why there are no CI builds lately?
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko <https://twitter.com/synhershko> 
>> Freelance Developer & Consultant Lucene.NET committer and PMC member
>>
>> On Sun, Oct 2, 2016 at 10:01 PM, Shad Storhaug 
>> <[email protected]>
>> wrote:
>>
>> > Hello,
>> >
>> > I just wanted to open this discussion to talk about the work 
>> > remaining to be done on Lucene.Net version 4.8.0. We are nearly 
>> > there, but that doesn't mean we don't still need help!
>> >
>> >
>> > FAILING TESTS
>> > -------------------
>> >
>> > We now have over 5000 passing tests and as soon as pull request 
>> > #188 (
>> > https://github.com/apache/lucenenet/pull/188) is merged, by my 
>> > count we have only 20 (actual) failing tests. Here is the breakdown 
>> > by
>> project:
>> >
>> > Lucene.Net (Core) - 15 failing / 1989 total 
>> > Lucene.Net.Analysis.Common
>> > - 0 failing / 1445 total Lucene.Net.Classification - 0 failing / 9 
>> > total Lucene.Net.Expressions - 0 failing / 94 total 
>> > Lucene.Net.Facet - (including #188 will be) 0 failing / 152 total 
>> > Lucene.Net.Join - 0 failing / 27 total Lucene.Net.Memory - 0 
>> > failing / 10 total Lucene.Net.Misc - 2 failing / 42 total 
>> > Lucene.Net.Queries - 2 failing / 96 total Lucene.Net.QueryParser - 
>> > 1 failing / 203 total Lucene.Net.Suggest - 0 failing / 142 total
>> >
>> > The reason why I said ACTUAL tests above is because I recently 
>> > discovered that many of the "failures" that are being reported are 
>> > false negatives (in fact, the VS2015 NUnit test runner shows there 
>> > are
>> > 135 failing tests total and 902 tests total that don't belong to 
>> > any project). Most NUnit 2.6 test runners do not correctly run 
>> > tests in shared abstract classes with the correct context (test 
>> > setup) to make them pass. These out-of-context runs add several 
>> > additional minutes to
>> the test run.
>> >
>> > As an experiment, I upgraded to NUnit 3.4.1 and it helped the 
>> > situation somewhat - that is, it ran the tests in the correct 
>> > context and I was able to determine that we have more tests than 
>> > the numbers above and they are all succeeding. However, it also ran 
>> > the tests in an invalid context (that is, the context of the 
>> > abstract class without any setup) and some of them still showed as 
>> > failures.
>> >
>> > I know @conniey is currently working on porting the tests over to xUnit.
>> > Hopefully, swapping test frameworks alone (or using some of the new 
>> > fancy test attributes) is enough to fix this issue. If not, we need 
>> > to find another solution - preferably one that can be applied to 
>> > all of the tests in abstract classes without too much effort or 
>> > changing them so they are too different from their Java counterpart.
>> >
>> > Remaining Pieces to Port
>> > ---------------------------------
>> >
>> > I took an inventory of the remaining pieces left to port a few days 
>> > ago and here is what that looks like (alphabetical order):
>> >
>> > 1. Analysis.ICU (Depends on ICU4j)
>> > 2. Analysis.Kuromoji
>> > 3. Analysis.Morfologik (Depends on Morfologik) 4. Analysis.Phonetic 
>> > (Depends on Apache Commons) 5. Analysis.SmartCN 6. Analysis.Stempel 
>> > (currently in progress) 7. Analysis.UIMA (Depends on Tagger, 
>> > uimaj-core, WhiteSpaceTokenizer) 8. Benchmark (many dependencies) 9.
>> > Demo 10. Highlighter (Depends on Collator (which is still being
>> > ported) and BreakIterator (which we don't have a solution that 
>> > works in .NET core yet)) 11. Replicator (many dependencies) 12. 
>> > Sandbox (Depends on Apache Jakarta) 13. Spatial (Already ported in 
>> > #174 (https://github.com/apache/ lucenenet/pull/174), needs a 
>> > recent version of spatial4n) 14. QueryParser.Flexible
>> >
>> > Itamar, it would be helpful if you would be so kind as to organize 
>> > this list in terms of priority. It also couldn't hurt to update the 
>> > contributing documents 
>> > (https://github.com/apache/lucenenet/blob/master/CONTRIBUTING.md,
>> > and
>> > https://cwiki.apache.org/confluence/display/LUCENENET/Current+Statu
>> > s with the latest information so anyone who wants to help out knows 
>> > the current status.
>> >
>> > Of course, it is the known status of dependencies that we need 
>> > clarification on. Which of these dependencies is known to be ported?
>> > Which of them are ported but are not up to date? Which of them are 
>> > known not to be ported, and which of them are unknown?
>> >
>> >
>> > Public API Inconsistencies
>> > ---------------------------------
>> >
>> > One thing that I have had my eye on for a while now is the 
>> > .NETification/consistency of the core API (that is, in the 
>> > Lucene.Net project). There are several issues that I would like to 
>> > address
>> including:
>> >
>> >
>> > 1.       Method names that are still camelCase
>> >
>> > 2.       Properties that should be methods (because they do a lot of
>> > processing or because they are non-deterministic)
>> >
>> > 3.       Methods that should be properties
>> >
>> > 4.       .Size() vs .Size vs .Count - should generally all be .Count in
>> > .NET
>> >
>> > 5.       Interfaces should begin with "I"
>> >
>> > 6.       Classes should not begin with "I" followed by another capital
>> > letter (for some reason some of them were named that way)
>> >
>> > 7.       .CharAt() should probably be this[]
>> >
>> > 8.       Generic types nested within generic types (which cause Visual
>> > Studio to crash when Intellisense tries to read them)
>> >
>> > ... and so on. The only thing is these are all sweeping changes 
>> > that will affect everyone helping out on Lucene.Net and anyone who 
>> > is currently using the beta. So, I just wanted to gather some input 
>> > on when the most appropriate time to begin working on these 
>> > sweeping
>> changes would be?
>> >
>> >
>> > Thanks,
>> > Shad Storhaug (NightOwl888)
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>

RE: Remaining Work/Priorities

Reply via email to