Michael, I've posted in another thread asking this, but what are some of the concerns that are limiting use of .NET 3.5? In moving to .NET 2.0 from 1.1, it's not that much more of a stretch to 3.5, and there are a ton of benefits that can be reaped from it (as I hope I've pointed out).
Also, what is considered "too far" from the original implementation? Assuming no public api changes, if the functionality of the code is maintained, then is there anything else that should be considered too far? Some of the things I am considering right now, for example: - Fixing all for/IEnumerable/ArrayList/Hashtable/IEquatable/Equals override/synchronization code. Right now, there are a number of for loops that should be replaced with foreach loops. That much is obvious. What doesn't seem apparent in the code is that calls to IEnumerable.GetEnumerator and then calls to the Next method on the IEnumerator instance that are returned in the project are actually incorrect from a .NET perspective. It is completely possible for IEnumerator implementations (generic and non) returned by IEnumerable to implement IDisposable. The foreach statement actually compiles into a using statement (of sorts) on the IEnumerator instance and then performs the iteration through the elements. As a best practice, it is always better to use foreach when dealing with IEnumerable than using the IEnumerator instance yourself, mostly because it's cleaner code, but also because of what I mentioned above. For the rest of the list, a lot of these things come up when comparing elements in sequences. For example, if you override Equals (in addition to GetHashCode of course), you should implement IEquatable<T> as well as override == and !=, and if you implement IComparable<T>, then you should override < and > as well and your Equals method should call Compare on IComparable, checking against zero. For example, in the MultiPhraseQuery class, in the Equals override, you have an error when enumerating through each of the term arrays. The assumption is made that they are of the same length (if it is a valid assumption, it's not indicated). SequenceEqual on the Enumerable class in LINQ would fix that instantly, BTW. The point is, in touching one, so many other things get touched. - Implementing IDispose properly There are a number of places where you have Close methods. These are obvious candidates for IDisposable implementations. However, from a .NET perspective Dispose is allowed to be called multiple times without side effects, whereas there are some places where you throw an exception if it is closed more than once. - Reducing visibility of internal members where not needed. I've seen API changes made because of lack of visibility for testing. The pollution of the API because of this is really bad, and it should be reduced. --- All this being said, I'd really like to start with the first bullet point (the synchronization issue is a big one, you should never, ^ever^ lock on "this", as it's an encapsulation issue, you are exposing your lock unwittingly, since it is "this", rather, you have a separate object which is used as the lock), starting with small changes to show what I mean (which have obvious benefits and zero functionality impact) and move from there. That is, if you guys want me to =) - Nick -----Original Message----- From: Michael Garski [mailto:mgar...@myspace-inc.com] Sent: Monday, November 09, 2009 9:37 PM To: lucene-net-dev@incubator.apache.org Subject: RE: Lucene.Net.Store Namespace Nick, While alteration of internal implementations will certainly be openly embraced, diverging too far from the original java implementation at this time isn't practical due to the small number of folks that actually contribute to Lucene.Net - there are only 3 committers at this time (I'm not). The (admittedly far off) goal is to keep Lucene.Net functionally equivalent with the Java implementation on a commit by commit basis, and once that has been attained divergence in the API can be discussed. That being said, as I am digging into the 2.9 port, we may have no choice but to go off of the 3.5 framework to ensure we can actually bring the 2.9 version to fruition. And don't get me started on ParallelMultiSearcher - it's a total dog. I have an implementation that I use with ThreadPool threads and ManualResetEvents along with object pooling that is much more performant. Michael -----Original Message----- From: Nicholas Paldino [.NET/C# MVP] [mailto:casper...@caspershouse.com] Sent: Monday, November 09, 2009 6:25 PM To: lucene-net-dev@incubator.apache.org Subject: RE: Lucene.Net.Store Namespace Michael, I agree, it's fairly low. I've just joined today after working with the stable 2.0 release privately and converting most of that to work with .NET 3.5. Most of it is actually usable in .NET 2.0, there is a little bit of LINQ in there, which cleans up the code tremendously where it is used (it helps a great deal with a lot of the ugly nested loops), but primarily, these are the things I've been able to achieve which I may or may not have been integrated already (this is copied from the user list, which I just replied to): - Proper implementation of IDisposable over Close methods (there is a proper pattern to adhere to, and the Close methods don't do it). - Proper implementation of IEnumerable<T>, ICollection<T>, IList<T> on collection types and changing enumeration through collections to foreach - Use of LINQ in some places in order to make code more declarative (e.g. flatting out nested loops, cleans up some VERY messy nested loops) - Removed use of Join method on the Thread class (it is depreciated), replaced with other .NET synchronization primitives. - Using Semaphore instead of Thread.Join for the multi thread searcher. - Replacing ArrayList and Hashtable with List<T> and Dictionary<TKey, TValue> instances - Using generic versions vs non-generic versions, especially when a type parameter is a structure provides massive performance gains (due to lack of boxing) - Where synchronized versions were used, locks were put into place at appropriate areas to lock access - Lock scope was expanded to ensure that multiple operations on the same synchronized resource is atomic - Implementing .NET types where appropriate - e.g. ScoreDocComparator becomes IScoreDocComparer, deriving from IComparer<ScoreDoc> - Methods that override Equals implement IEquatable<T>, and possibly, IComparable<T>, as well as provide == and != overrides. - Condensing types - e.g. ICharStream is defined twice. - Cleaned up excessive use of internal. I'd also like to address Get and Set methods, replacing them with properties, but I don't know if that crosses the line for the group. There are a bunch of other things that I see can use work, but at that point, I feel I might be stepping on toes, as it would affect the shape of the API. Of course, if that's the direction the group wants to go, then great, but I think what I've listed above is enough for now. - Nick -----Original Message----- From: Michael Garski [mailto:mgar...@myspace-inc.com] Sent: Monday, November 09, 2009 8:31 PM To: lucene-net-dev@incubator.apache.org Subject: RE: Lucene.Net.Store Namespace Thanks Nick! Official 4.0 support of Lucene would be a ways off, however an implementation that uses 4.0 could always be added to the contrib section. I think an NIOFSDirectory implementation is fairly low on the priority list at the moment... unless you'd like to look into it ;) Michael -----Original Message----- From: Nicholas Paldino [.NET/C# MVP] [mailto:casper...@caspershouse.com] Sent: Monday, November 09, 2009 4:56 PM To: lucene-net-dev@incubator.apache.org Subject: RE: Lucene.Net.Store Namespace Michael, From my perspective, this is a memory-mapped file. Explicit support for memory-mapped files is provided in .NET 4.0, but from what I can tell (I just joined the mailing list today), that's a long way off. However, you can provide the same functionality through the Win32 API (which can be accessed through the P/Invoke layer). Here are the functions: http://msdn.microsoft.com/en-us/library/aa911527.aspx Note if you want to create an implementation of this, you are going to have to use SafeHandle instances. If you have to create specialized ones, doing it right requires some pretty delicate work (you need to attribute everything correctly for CER guarantees). - Nick -----Original Message----- From: Michael Garski [mailto:mgar...@myspace-inc.com] Sent: Monday, November 09, 2009 7:16 PM To: lucene-net-dev@incubator.apache.org Subject: Lucene.Net.Store Namespace Woo-hoo! I've been authorized to commit full-time to getting Lucene 2.9 in shape and ready to go! I've submitted 6 patches for various fixes in the Store namespace, they are all independent, however there may be some cleanup throughout the namespace once they are all reviewed, approved, and committed. There are certainly some optimizations that can be done in there and I plan on taking those on when the tests are all in a passing state. I suggest we hold off on a .NET equivalent to NIOFSDirectory at this time. I'm not even sure if there even is a .NET or underlying system call that provides the same functionality as the FileChannel classes. Anyone have any info on that topic? Michael Michael Garski Sr. Search Architect 310.969.7435 (office) 310.251.6355 (mobile) www.myspace.com/michaelgarski
smime.p7s
Description: S/MIME cryptographic signature