NightOwl888 commented on issue #271:
URL: https://github.com/apache/lucenenet/issues/271#issuecomment-1305099323

   @vvdb-architecture
   
   Thanks. So that leaves the question - how do we enforce the [TokenStream 
workflow 
contract](https://lucenenet.apache.org/docs/4.8.0-beta00016/api/core/Lucene.Net.Analysis.TokenStream.html)
 in .NET?
   
   The exception is meant to be a guide to help at development time to ensure 
all of the methods of the contract are followed. Once that is done, this 
exception will never actually occur at runtime. So, that means the part that 
they are worried about - adding an exception handler inside a finally block - 
is not a thing that will ever need to happen.
   
   One option I have considered is to build a Roslyn code analyzer to enforce 
this contract at design time, but I am not sure how complex it would be to 
analyze APIs of the user code no matter how they have it configured and ensure 
that all of the methods are called in the right sequence and the right number 
of times in the code. They could be building one or more abstractions that do 
some of the `TokenStream` operations while leaving the others somewhere else, 
or they may be in conditional parts of the code, for example.
   
   Some of the rules, such as enforcing the calls to `base.Reset()`, and 
`base.End()` when they are overridden would be easy to enforce, while others 
would take more work. The exception in the `Dispose()` method is definitely a 
simpler approach, but if we had an Roslyn code analyzer that worked reliably, 
it would help to speed up the development process by catching this sort of 
thing at design time instead of runtime.
   
   However, this still doesn't get us out of the situation we are in where 
`close()` then a call to `reset()` to start the `TokenStream` over from the 
beginning is allowed in Lucene - `Dispose()` is meant to be the final call that 
doesn't allow anything else to happen to the object. This is still a violation 
of the `Dispose()` contract in .NET, so we need to dig into whether `Close()` 
makes sense here, just to inform consumers that it is not actually a 
`Dispose()` operation. Or maybe `End()` is supposed to signify when the end of 
the stream is reached (after which case `Reset()` is allowed) and `Dispose()` 
is supposed to signify the end of the operation. In that case the [tight 
loop](https://github.com/apache/lucenenet/blob/081edeed35b190c1d535dcfdfeb91143f0ef818f/src/Lucene.Net/Index/DocFieldProcessor.cs#L280-L284)
 should be refactored never to call `Dispose()` inside of the loop and have a 
final loop at the end that calls `Dispose()` on all of the `TokenStreams` to 
ensure methods n
 ever are called after `Dispose()`.
   
   According to [this error 
message](https://github.com/apache/lucene/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/analysis/Tokenizer.java#L110-L112),
 it is supposed to be invalid to call `reset()` multiple times. If that is 
really the case, then the [tight 
loop](https://github.com/apache/lucenenet/blob/081edeed35b190c1d535dcfdfeb91143f0ef818f/src/Lucene.Net/Index/DocFieldProcessor.cs#L280-L284)
 can be considered a bug that needs to be fixed, and we don't need to change 
`Dispose()` back to `Close()`. Although, the bug may boil down to the fact we 
added an [extra call to `Dispose()` in 
.NET](https://github.com/apache/lucenenet/blob/dfae964ce0a1b066fd808275fd81d90385cde3fe/src/Lucene.Net/Analysis/Tokenizer.cs#L76)
 and it shouldn't actually be there.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to