Hello everyone,

The phrase:

What I was proposing was to make the Enum also look like an IEnumerable so that 
you can foreach over it.

…made me get out of bed, put on my swimsuit and jump into the Lucene.Net water 
again (and oh boy, the water is cold).
I apologize if what I write is well-known, but I’ve been following the 
enumerator discussion from a distance and haven’t seen it mentioned. So here it 
goes:

Foreach() doesn’t need IEnumerable. My point will not be that IEnumerable is 
unnecessary, but that if you decide to introduce bona-fide .NET enumerators, 
there is an impact on efficiency (in terms of heap pressure and an additional 
levels of indirection) that should be managed.

The following class A  implements neither IEnumerable nor IEnumerator:

                class A
                {
                        public class AEnumerator 
                        {
                                int i;

                                public int Current
                                {
                                        get
                                        {
                                                return i;
                                        }
                                }

                
                                public bool MoveNext()
                                {
                                        return ++i < 2;
                                }
                        }

                        public AEnumerator GetEnumerator()
                        {
                                return new AEnumerator();
                        }
                }

… but you can write the following foreach() loop with no problem:

                        A a = new A();
                        foreach(var item in a)
                        {

                        }

The reason is that in the C# language specification, foreach just needs a 
public function called GetEnumerator() returning a type having a public 
MoveNext() method and a public Current property. No mention is made of 
interfaces.

To understand why this is important, consider an enumerable type implemented 
with the “standard” IEnumerable<T>/IEnumerator<T> pattern. A foreach() loop 
with this type will have an impact on efficiency, because:
-       Every foreach() is a call to GetEnumerator(), which returns an 
IEnumerator<T>. Because the latter is an interface, this return value will 
always be a reference, i.e. allocated on the heap. So every foreach is an extra 
heap allocation, for the duration of the loop. Nested foreach() loops cause as 
many heap allocations as the number of iterations they are in.
-       Every loop variable is a virtual call to the Current property, and 
every iteration is a virtual call to MoveNext(). These calls are virtual 
because they are specified through an interface, so by definition they will 
never be expanded inline, even though in many cases the Current property is 
trivial and the cost of calling it therefore disproportionally high.

Of course, if you don’t use IEnumerable<T>, you’re missing out on the cool LINQ 
stuff which is largely implemented as a bunch of extension methods to 
IEnumerable<T>.  Would it be possible to have your cake and eat it too?

The answer is “yes”. We can use two things:
-       Our knowledge of the implementation of foreach()
-       The fact that in .NET value types implementing interfaces cause 
interface devirtualization, and therefore make interface methods and properties 
candidates for inline expansion.

The following implementation of the above class uses 
IEnumerable<T>/IEnumerator<T>, but the foreach() loop doesn’t cause heap 
allocations and allows inline expansion:

                class A: IEnumerable<int>
                {
                        public struct AEnumerator: IEnumerator<int>
                        {
                                int i;

                                public int Current
                                {
                                        get
                                        {
                                                return i;
                                        }
                                }

                                object IEnumerator.Current
                                {
                                        get
                                        {
                                                return Current;
                                        }
                                }


                                public void Dispose()
                                {
                                }

                                public bool MoveNext()
                                {
                                        return ++i < 2;
                                }

                                public void Reset()
                                {
                                }
                        }

                        public AEnumerator GetEnumerator()
                        {
                                return new AEnumerator();
                        }

                        IEnumerator<int> IEnumerable<int>.GetEnumerator()
                        {
                                return GetEnumerator();
                        }

                        IEnumerator IEnumerable.GetEnumerator()
                        {
                                return GetEnumerator();
                        }
                }

Because the interface implementation of IEnumerable<int> is explicit, the 
foreach loop will still use the (non-interface) GetEnumerator() version 
returning an AEnumerator directly. This is a value type, which will be returned 
on the stack: no heap allocation.
Because AEnumerator is a value type (struct), its interface implementation is 
devirtualized, so the method MoveNext() and the property Current are candidates 
for inline expansion.

The performance impact is not negligible. Microsoft knows this as well: have a 
look at how List<T>/HashSet<T>/Dictionary<TKey,TValue> implement 
GetEnumerator().

So the message is: if it’s decided to introduce .NET enumerators in Lucene.NET, 
it’s my humble opinion that these performance issues should be handled.

Rant mode: what the implementation of foreach() effectively creates, is a poor 
man’s illusion of covariant return types. If we had “real” covariant return 
types and proper interface devirtualization (i.e. not only in value types, but 
in all sealed types), all this mess wouldn’t be necessary. It’s a shame, 
really, but such is life.


Vincent


-----Original Message-----
From: Shad Storhaug (JIRA) [mailto:j...@apache.org] 
Sent: Wednesday, August 09, 2017 10:41 PM
To: dev@lucenenet.apache.org
Subject: [jira] [Commented] (LUCENENET-469) Convert Java Iterator classes to 
implement IEnumerable<T>


    [ 
https://issues.apache.org/jira/browse/LUCENENET-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120627#comment-16120627
 ] 

Shad Storhaug commented on LUCENENET-469:
-----------------------------------------

{quote}It's this Enum class I was calling the "collection" type 'cos it kind of 
holds the collection state.

What I was proposing was to make the Enum also look like an IEnumerable so that 
you can foreach over it.
Which means it needs a GetEnumerable() to return an IEnumerator. This new 
enumerator is the EnumEnumerator type from in previous messages.

So, I'm taking an Enum, adding an IEnumerable facade, to return a new 
IEnumerable type which pokes the Enum to move forward.{quote}

That is an interesting idea.

I am not sure if the state is actually being "iterated over" so much as it is 
being calculated on the fly. But then, {{IEnumerable<T>}} is just an interface 
- it doesn't actually define any behavior. In fact, I had to redefine what the 
key and value {{ICollection<T>}} s do in CharArrayMap's KeyCollection and 
ValueCollection, so maybe this is also possible.

> NB: some thought would be needed to define what the "item" class 
> (representing Current) should look like in each case.
> NB: Calling a Current changing method (ie one of the Seek...() methods) from 
> inside a foreach would result in undefined behavior.

So then we end up with some behavior that doesn't fit the mold.

I thought for a moment that it could be possible these SeekCeil and SeekExact 
methods are only defined here because there are no such thing as extension 
methods in Java, but since they are overridden in subclasses, that is probably 
not a valid assumption.

Speaking of which, I suggest you have a look at how they are used in the 
TermsEnum overloads in the codecs (for example 
[Lucene40TermVectorsReader](https://github.com/apache/lucenenet/blob/468199e3fa95c7e1f77b14e7f00aeaafd7c2f8b9/src/Lucene.Net/Codecs/Lucene40/Lucene40TermVectorsReader.cs#L446))
 or 
[Lucene45DocValuesProducer.TermsEnumAnonymousInnerClassHelper](https://github.com/apache/lucenenet/blob/468199e3fa95c7e1f77b14e7f00aeaafd7c2f8b9/src/Lucene.Net/Codecs/Lucene45/Lucene45DocValuesProducer.cs#L1090).

As you can see, the complexity of these enums becomes quite high (lots of 
internal state), and it seems to me it would be simpler to port future changes 
from Lucene if the general structure of the "extras" (that is, the properties 
and Seek* methods) were left alone (or at least left intact and used by some 
other type of facade). The DocsAndPositionsEnum s are even more complex than 
this.

The only thing I was hoping to get out of this was the added ability to use a 
foreach loop on Terms (and possibly the DocIdSetIterator). Sure we could 
refactor the TermsEnum data structure to be more .NET-like, but then where 
would we be when we need to connect the dots on the next Lucene upgrade? I am 
not necessarily against it, but finding bugs in the codec code is no picnic, so 
we would need some clear way to ensure future codecs behavior can be mapped 
onto the new structure without causing too much pain. Keep in mind, the codecs 
change in every major (and nearly every minor) version.

{quote}Phew... Did that make sense; fit with your view of the universe?{quote}

Wow, do I sound so closed minded? I think I need to work on my delivery :). 

> Convert Java Iterator classes to implement IEnumerable<T>
> ---------------------------------------------------------
>
>                 Key: LUCENENET-469
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-469
>             Project: Lucene.Net
>          Issue Type: Sub-task
>          Components: Lucene.Net Contrib, Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3, 
> Lucene.Net 4.8.0
>         Environment: all
>            Reporter: Christopher Currens
>             Fix For: Lucene.Net 4.8.0
>
>
> The Iterator pattern in Java is equivalent to IEnumerable in .NET.  Classes 
> that were directly ported in Java using the Iterator pattern, cannot be used 
> with Linq or foreach blocks in .NET.
> {{Next()}} would be equivalent to .NET's {{MoveNext()}}, and in the below 
> case, {{Term()}} would be as .NET's {{Current}} property.  In cases as below, 
> it will require {{TermEnum}} to become an abstract class with {{Term}} and 
> {{DocFreq}} properties, which would be returned from another class or method 
> that implemented {{IEnumerable<TermEnum>}}.
> {noformat} 
>       public abstract class TermEnum : IDisposable
>       {
>               public abstract bool Next();
>               public abstract Term Term();
>               public abstract int DocFreq();
>               public abstract void  Close();
>               public abstract void Dispose();
>       }
> {noformat} 
> would instead look something like:
> {noformat} 
>       public class TermFreq
>       {
>               public abstract Term { get; }
>               public abstract int { get; }
>       }
>         public abstract class TermEnum : IEnumerable<TermFreq>, IDisposable
>         {
>                 // ...
>         }
> {noformat}
> Keep in mind that it is important that if the class being converted 
> implements {{IDisposable}}, the class that is enumerating the terms (in this 
> case {{TermEnum}}) should inherit from both {{IEnumerable<T>}} *and* 
> {{IDisposable}}.  This won't be any change to the user, as the compiler 
> automatically calls {{IDisposable}} when used in a {{foreach}} loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to