Chris, Now that you have spent some time dealing with the porting what is your view on creating a fully automated porting tool?
Scott > -----Original Message----- > From: Christopher Currens [mailto:currens.ch...@gmail.com] > Sent: Monday, November 21, 2011 5:23 PM > To: lucene-net-dev@lucene.apache.org > Subject: Re: [Lucene.Net] Roadmap > > Digy, > > No worries. I wasn't taking them personally. You've been > doing this for a lot longer than I have, but I didn't > understand you pain until I had to go through it personally. :P > > Have you looked at Contrib in a while? There's a lot of > projects that are in Java's Contrib that are not in > Lucene.Net? Is this because there are some that can't easily > (if at all) be ported over to .NET or just because they've > been neglected? I'm trying to get a handle on what's > important to port and what isn't. Figured someone with > experience could help me with a starting point over deciding > where to start with everything that's missing. > > > Thanks, > Christopher > > On Mon, Nov 21, 2011 at 2:13 PM, Digy <digyd...@gmail.com> wrote: > > > > > Chris, > > > > Sorry, if you took my comments about "pain of porting" personally. > > That wasn't my intension. > > > > +1 for all your changes/divergences. I made/could have made > them too. > > > > DIGY > > > > -----Original Message----- > > From: Christopher Currens [mailto:currens.ch...@gmail.com] > > Sent: Monday, November 21, 2011 11:45 PM > > To: lucene-net-dev@lucene.apache.org > > Subject: Re: [Lucene.Net] Roadmap > > > > Digy, > > > > I used 2.9.4 trunk as the base for the 3.0.3 branch, but I > looked to > > the code in 2.9.4g as a reference for many things, particularly the > > Support classes. We hit many of the same issues I'm sure, I moved > > some of the anonymous classes into a base class where you > could inject > > functions, though not all could be replaced, nor did I replace all > > that could have been. Some of our code is different, I > went for the > > option for WeakDictionary to be completely generic, as in > wrapping a > > generic dictionary with WeakKey<T> instead of wrapping the already > > existing WeakHashTable in support. In hindsight, it may have just > > been easier to convert the WeakHashTable to generic, but alas, I'm > > only realizing that now. There is a problem with my > WeakDictionary, > > specifically the function that determines when to clean/compact the > > dictionary and remove the dead keys. I need a better heuristic of > > deciding when to run the clean. That's a performance issue though. > > > > Regarding the "pain of porting", I am a changed man. It's > nice, in a > > sad way, to know that I'm not the only one who experienced > those difficulties. > > I used to be in the camp that porting code that differed from java > > wouldn't be difficult at all. However, now I code corrected! It > > threw me a curve-ball, for sure. I DO think a line-by-line > port can > > definitely include the things talked about below, ie the changes to > > Dispose and the changes to IEnumerable<T>. Those changes, I thing, > > can be made without a heavy impact on the porting process. > > > > There was one fairly large change I opted to use that > differed quite a > > bit from Java, however, and that was the use of the TPL in > > ParallelMultiSearcher. It was far easier to port this way, and I > > don't think it affects the porting process too much. Java uses a > > helper class defined at the bottom of the source file that > handles it, > > I'm simply using a built-in one instead. I just need to be careful > > about it, it would be really easy to get carried away with it. > > > > > > Thanks, > > Christopher > > > > On Mon, Nov 21, 2011 at 1:20 PM, Digy <digyd...@gmail.com> wrote: > > > > > Hi Chris, > > > > > > First of all, thank you for your great work on 3.0.3 branch. > > > I suppose you took 2.9.4 as a code base to make 3.0.3 port since > > > some of your problems are the same with those I faced in > 2.9.4g branch. > > > (e.g, > > > Support/MemoryMappedDirectory.cs (but never used in core), > > > IDisposable, > > > introduction of some Action<T>s, Func<T>s , > > > "foreach" instead of "GetEnumerator/MoveNext", > > > IEquatable<T>, > > > WeakDictionary<T>, > > > Set<T> > > > etc. > > > ) > > > > > > Since I also used 3.0.3 as a reference, maybe we can use some of > > > 2.9.4g's code in 3.0.3 when necessary(I haven't had time to look > > > into 3.0.3 > > deeply) > > > > > > Just to ensure the coordination, maybe you should create > a new issue > > > in JIRA, so that people send patches to that issue instead of > > > directly commiting. > > > > > > > > > @Prescott, > > > 2.9.4g is not behind of 2.9.4 in bug fixes & features > level. So, It > > > is (I > > > think) ready for another release.(I use it in all my > projects since > > long). > > > > > > > > > PS: Hearing the "pain" of porting codes that greatly differ from > > > Java > > made > > > me just smile( sorry for that:( ). Be ready for responses > that get > > > beyond the criticism between "With all due respect" & > "Just my $0.02" > > paranthesis. > > > > > > DIGY > > > > > > -----Original Message----- > > > From: Christopher Currens [mailto:currens.ch...@gmail.com] > > > Sent: Monday, November 21, 2011 10:19 PM > > > To: lucene-net-dev@lucene.apache.org; casper...@caspershouse.com > > > Subject: Re: [Lucene.Net] Roadmap > > > > > > Some of the Lucene classes have Dispose methods, well, ones that > > > call > > Close > > > (and that Close method may or may not call base.Close(), > if needed > > > or > > not). > > > Virtual dispose methods can be dangerous only in that > they're easy > > > to implement wrong. However, it shouldn't be too bad, at > least with > > > a line-by-line port, as we would make the call to the base class > > > whenever Lucene does, and that would (should) give us the same > > > behavior, > > implemented > > > properly. I'm not aware of differences in the JVM, regarding > > > inheritance and base methods being called automatically, > particularly Close methods. > > > > > > Slightly unrelated, another annoyance is the use of Java > Iterators > > > vs C# Enumerables. A lot of our code is there simply > because there > > > are Iterators, but it could be converted to Enumerables. > The whole > > > HasNext, Next vs C#'s MoveNext(), Current is annoying, > but it's used > > > all over in > > the > > > base code, and would have to be changed there as well. > Either way, > > > I > > would > > > like to push for that before 3.0.3 is relased. IMO, > small changes > > > like this still keep the code similar to the line-by-line > port, in > > > that it doesn't add any difficulties in the porting process, but > > > provides great benefits to the users of the code, to have a .NET > > > centric API. I don't think it would violate our project > desciption > > > we have listed on our Incubator page, either. > > > > > > > > > Thanks, > > > Christopher > > > > > > On Mon, Nov 21, 2011 at 12:03 PM, casper...@caspershouse.com < > > > casper...@caspershouse.com> wrote: > > > > > > > +1 on the suggestion to move Close -> IDisposable; not > being able > > > > +to > > use > > > > "using" is such a pain, and an eyesore on the code. > > > > > > > > > > > > Although it will have to be done properly, and not just have > > > > Dispose > > call > > > > Close (you should have proper protected virtual Dispose > methods to > > > > take inheritance into account, etc). > > > > > > > > > > > > - Nick > > > > > > > > ---------------------------------------- > > > > > > > > From: "Christopher Currens" <currens.ch...@gmail.com> > > > > > > > > Sent: Monday, November 21, 2011 2:56 PM > > > > > > > > To: lucene-net-dev@lucene.apache.org > > > > > > > > Subject: Re: [Lucene.Net] Roadmap > > > > > > > > > > > > Regarding the 3.0.3 branch I started last week, I've > put in a lot > > > > of > > late > > > > > > > > nights and gotten far more done in a week and a half > than I expected. > > > The > > > > > > > > list of changes is very large, and fortunately, I've > documented it > > > > in > > > some > > > > > > > > files that are in the branches root of certain projects. I'll > > > > list > > what > > > > > > > > changes have been made so far, and some of the concerns I have > > > > about > > > them, > > > > > > > > as well as what still needs to be done. You can read > them all in > > detail > > > > in > > > > > > > > the files that are in the branch. > > > > > > > > > > > > All changes in 3.0.3 have been ported to the Lucene.Net and > > > > > > > > Lucene.Net.Test, except BooleanClause, LockStressTest, > > > > MMapDirectory, > > > > > > > > NIOFSDirectory, DummyConcurrentLock, NamedThreadFactory, and > > > > > > > > ThreadInterruptedException. > > > > > > > > > > > > MMapDirectory and NIOFSDirectory have never been ported in the > > > > first > > > place > > > > > > > > for 2.9.4, so I'm not worried about those. LockStressTest is a > > > > > > > > command-line tool, porting it should be easy, but not > essential to > > > > a > > > 3.0.3 > > > > > > > > release, IMO. DummyConcurrentLock also seems unnecessary (and > > > > > > > > non-portable) for .NET, since it's based around Java's > Lock class > > > > and > > is > > > > > > > > only used to bypass locking, which can be done by passing new > > > > Object() > > to > > > > > > > > the method. > > > > > > > > NamedThreadFactory I'm unsure about. It's used in > > ParallelMultiSearcher > > > > > > > > (in which I've opted to use the TPL), and seems to be only used > > > > for > > > > > > > > debugging, possibly testing. Either way, I'm not sure > it's necessary. > > > > > > > > Also, named threads would mean we probably would have > to move the > > > > class > > > > > > > > from the TPL, which greatly simplified the code and > > > > parallelization of > > it > > > > > > > > all, as I can't see a way to Set names for a Task. I > suppose it > > > > might > > be > > > > > > > > possible, as Tasks have unique Ids, and you could use a > Dictionary > > > > to > > map > > > > > > > > the thread's name to the ID in the factory, but you'd have to > > > > create a > > > > > > > > helper function that would allow you to find a task by > its name, > > > > which > > > > > > > > seems more work than the resulting benefits. VS2010 > already has > > > > better > > > > > > > > support for debugging tasks over threads (I used it > when writing > > > > the > > > > > > > > class), frankly, it's amazing how easy it was to debug. > > > > > > > > > > > > Other than the above, the entire code base in the core > dlls is at > > 3.0.3, > > > > > > > > which is exciting, as I'm really hoping we can get > Lucene.Net up > > > > to the > > > > > > > > current version of Java's 3.x branch, and start working on a > > line-by-line > > > > > > > > port of 4.0. Tests need to be written for some of the > collections > > > > I've > > > > > > > > made that emulate Java's, to make sure they're even > behaving the > > > > same > > > way. > > > > > > > > The good news is that all of the existing tests pass as > a whole, > > > > so it > > > > > > > > seems to be working, though I'd like the peace of mind > of having > > > > tests > > > for > > > > > > > > them (being HashMap<TKey, TValue>, WeakDictionary<TKey, TValue> > > > > and > > > > > > > > IdentityCollection<TKey, TValue>, it's quite possible > any one of > > > > them could > > > > > > > > be completely wrong in how they were put together.) > > > > > > > > > > > > I'd also like to finally formalize the way we use IDisposable in > > > > > > > > Lucene.Net, by marking the Close functions as obsolete, > moving the > > > > code > > > > > > > > into Dispose, and eventually (or immediately) removing > the Close > > > > functions. > > > > > > > > There's so much change to the API, that now would be a > good time > > > > to > > make > > > > > > > > that change if we wanted to. I'm hesitant to move from a > > > > line-by-line port > > > > > > > > of Lucene.Net completely, but rather having it be close > as possible. > > The > > > > > > > > main reason I feel this way, is when I was porting the Shingle > > namespace > > > > of > > > > > > > > Contrib.Analyzers, Troy has written it in a .Net way which > > > > different > > > > > > > > GREATLY from java lucene, and it did make porting it > considerably > > > > more > > > > > > > > difficult; to keep the language to a minimum, I'm just going to > > > > say it > > > was > > > > > > > > a pain, a huge pain in fact. I love the idea of moving > to a more > > > > .NET > > > > > > > > design, but I'd like to maintain a line-by-line port > anyway, as I > > > > think > > > > > > > > porting changes is far easier and quicker that way. At this > > > > point, I'm > > > > > > > > more interested in getting Lucene.Net to 4.0 and caught up to > > > > java, > > than > > > I > > > > > > > > am anything else, hence the extra amount of time I've put into > > > > this project > > > > > > > > over the past week and a half. Though this isn't > really a place > > > > for > > this > > > > > > > > discussion. > > > > > > > > > > > > The larger area of difficult for the port, however, is > the Contrib > > > > section. > > > > > > > > There are two major problems with it that is slowing me down. > > > > First, > > > > > > > > there are a lot of classes that are outdated. I've > found versions > > > > of > > > code > > > > > > > > that still have the Apache 1.1 License attached to it, > which makes > > > > the code > > > > > > > > quite old. Also, it was almost impossible for me to > port a lot of > > > changes > > > > > > > > in Contrib.Analyzers, since the code was so old and > different from > > Java's > > > > > > > > 2.9.4. > > > > > > > > > > > > Second, we had almost no unit tests ported for any of > the classes, > > which > > > > > > > > means they have to be ported from scratch. > > > > > > > > > > > > Third, there are a lot of contrib projects that have never been > > > > ported over > > > > > > > > from java. That list includes: smartcn (I believe this is an > > intelligent > > > > > > > > Chinese analyzer), benchmark, collation, db, lucli, > memory, misc, > > > > > > > > queryparser, remote, surround, swing, wikipedia, > xml-query-parser. > > > > > > > > However, it should be noted that I'm not even sure > which, if any, > > SHOULD > > > > > > > > be ported or even CAN be ported. > > > > > > > > > > > > The progress on 3.0.3 Contrib is going steady, however. The > > > > entire > > > > > > > > Analyzers project (except for smartcn) has been ported, > as well as > > > > the test > > > > > > > > for them, which all pass. There were some minor exceptions, the > > > > > > > > ThaiAnalyzer and hyphenation analyzers that could not be ported, > > > > > > > > ThaiAnalyzer because it relies on BreakIterator, and there's no > > built-in > > > > > > > > functionality to split a string by words based on a culture in > > > > .NET, > > and > > > > no > > > > > > > > third party library I could find that easily does it, and > > > > Hyphenation, > > > > > > > > because it relies on SAX xml processing, which is also missing > > > > from > > .NET. > > > > > > > > > > > > The FastVectorHighlighter project has also had all > 3.0.3 changes > > > > ported > > > to > > > > > > > > the project and it's Tests, as well, all passing. All other > > > > projects > > in > > > > > > > > contrib have yet to be touched/ported. > > > > > > > > > > > > You can find some of my notes scattered about in // > TODO comments, > > > > but most > > > > > > > > centralized in the project directories: > > > > > > > > > > > > src\core\FileDiffs.txt > > > > > > > > src\core\ChangeNotes.txt > > > > > > > > src\contrib\Analyzers\FileDiffs.txt > > > > > > > > test\core\UpdatedTests.txt > > > > > > > > test\contrib\analyzers\PortedTests.txt > > > > > > > > > > > > If, and by if I mean when, you find porting errors, let me know > > > > and fix > > > > > > > > them or have me fix them, or whatever you want to do. > The thing I > > worry > > > > > > > > about the most are the tests for the collections I > listed above, > > > > which > > I > > > > > > > > will get around to writing soon. I *have* found some porting > > > > issues in the > > > > > > > > core dll that didn't manifest themselves in the Lucene.Net.Test > > > > test cases, > > > > > > > > but did when I ported some of the tests for > Contrib.Analyzers. I > > > > have > > a > > > > > > > > feeling they will be found slowly and surely, but I > feel that they > > > > are > > > few > > > > > > > > and far between. > > > > > > > > > > > > If anyone wants to help on this branch, I'd welcome it, > we would > > > > just > > > need > > > > > > > > to coordinate who is working on what, so we aren't porting the > > > > same > > thing > > > > > > > > and wasting time. > > > > > > > > > > > > Thanks, > > > > > > > > Christopher > > > > > > > > > > > > TL;DL: Lucene.Net/Lucene.Net.Tests have all been ported > to 3.0.3 > > > > (with > > a > > > > > > > > few very minor exceptions), > > > > Contrib.Analyzers/Contrib.Analyzer.Test > > have > > > > > > > > all been ported to 3.0.3 (few minor exceptions), > > > > > > > > FastVectorHighlighter/FastVectorHighlighter.Tests have all been > > > > ported > > to > > > > > > > > 3.0.3, and the rest of Contrib is going to be a pain. > > > > > > > > > > > > On Sun, Nov 20, 2011 at 11:44 AM, Prescott Nasser > > > > <geobmx...@hotmail.com>wrote: > > > > > > > > > > > > > > > > > > > > > > Anyone have any thoughts on these items? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My 2 cents is that after we get 2.9.4 out the door, we quickly > > release > > > a > > > > > > > > > 2.9.4g (Digy - you're probably most familiar with 2.9.4g, is > > > > > there > > any > > > > work > > > > > > > > > that we should do to that to get it solid for a release? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm still unsure the status of 3.0.3 or 4.0, but I'm thinking > > > > > for the > > > > next > > > > > > > > > release in Q1 2012. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > While you all take a look at the artifacts for a vote - I > > > > > > wanted to > > > > talk > > > > > > > > > about the future roadmap and our releases - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2.9.4g is very stable - do we want to release this > at some point? > > > > > > > > > > > > > > > > > > > > 3.0.3 - chris looks to be pretty active on this. Chris, can > > > > > > you > > fill > > > > us > > > > > > > > > in on what's the status of this branch? > > > > > > > > > > > > > > > > > > > > 4.0 - looks to be partially underway. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I want to try and maybe build a better release schedule and > > > > > > begin > > > > > > > > > filling out what needs to be done so people can > easily jump in > > > > > and > > help > > > > > > > > > out. I noticed the 4.0 status page in the wiki - that's > > > > > excellent > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ~P > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- > > > > > > Checked by AVG - www.avg.com > > > Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date: > > > 11/21/11 > > > > > > > > > > ----- > > > > Checked by AVG - www.avg.com > > Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date: > > 11/21/11 > > > > >