[GitHub] [lucenenet] NightOwl888 commented on issue #763: Support async task

GitBox Mon, 21 Nov 2022 15:27:16 -0800


NightOwl888 commented on issue #763:
URL: https://github.com/apache/lucenenet/issues/763#issuecomment-1322793010


   > Furthermore, now that I looked a bit around in the code which would be 
involved in the above, it might actually not be completely unfeasibly to 
introduce async into lucene at this layer for such scenarios as a "Addin" to 
Lucene.NET, that means we are talking coud outside of lucene itself that 
extends the appropriate classes (e.g. SimpleFSDirectory, NIOFSDirectory, 
MMapDirectory etc.) or making "Mirrors" of them and then providing the 
asynchronous overloads on those and their associated Input/Output classes. And 
then you can use those during replication allowing you to use async patterns 
when copying the files from the disk etc. From there it will require a bit of 
Type checking/casting, but other than that it should be doable... Lets mark 
that as **OPTION-A**.
   > 
   > It COULD possibly also be introduced into Lucene it self by marking the 
involved classes here as partial classes and then implement the Async behaviors 
there, that might allow us to introduce it without breaking with the principle 
of sticking close to the Java side as we could then add it into Lucene.Net 
specific async files that can clearly be marked as "Lucene.Net Specific" - that 
should in theory mean that all we have to do in future ports is again to open 
up these ported classes by marking them as partial. Ofc. Breaking changes can 
occur in the stack that means we have to do major rework on the Async parts, 
but it see it as more manageable. Lets mark that as **OPTION-B**.
   > 
   > While **OPTION-B** could be interesting to investigate as it might have to 
opportunity to introduce async/await into Lucene.NET gradually if anyone wishes 
to give that a shot, to begin with the use will be very limited and probably 
specifically only to scenarios outlined here, it would not affect the core 
itself which would still use the blocking API's and I can't say how far up the 
stack this could be done without hitting a wall where we can't go further 
without actually compromising on the principle of having a port as close to the 
java source as possible to make furture ports easier. I REALLY doubt that we 
could get all the way out to making a `IndexWriter.UpdateDocumentAsync(...)` 
without hitting that problem, but I can't say for sure.
   > 
   > So I am not sure **OPTION-B** is worth the effort considering that 
**OPTION-A** is probably feasible and it's only downside is the requirement of 
casts/type checks. And even IF we would find that worth it, I don't think 
anyone in the team sees it as a priority.
   > 
   > But feel free to share what you guys think.
   
   ## Partial Classes
   
   I am not sure I totally understand the difference between **OPTION-A** and 
**OPTION-B**. Partial classes are a compiler feature that makes it easier to 
organize code, but at the end of the day they get compiled into 1 class. Both 
partial classes have access to all of the private state of the class.
   
   And if we add code to the codebase that is not part of Lucene, it should 
**all** be in partial classes and put into the Support folder. For example, to 
extend `IndexWriter` with async code, the original `IndexWriter` class would be 
marked as a partial class. Then we would add a partial class named 
`IndexWriter` below the Support folder to extend it with non-Java ported code.
   
   ### File Location Lucene.Net/Index/IndexWriter.cs
   
   ```c#
   namespace Lucene.Net.Index
   {
        public partial class IndexWriter : IDisposable, ITwoPhaseCommit
        {
            // Implementation...
        }
   }
   ```
   
   ### File Location Lucene.Net/Support/Index/IndexWriter.cs
   
   ```c#
   namespace Lucene.Net.Index
   {
        public partial class IndexWriter // Leave off all of the 
interfaces/base classes
        {
            // Code that was not part of Lucene 4.8.0 (or our best 
interpretation of it)...
        }
   }
   ```
   
   ## Exceptions
   
   But, that is the easy part. The problem is that everyone involved in this 
conversation is thinking like a .NET developer: we only throw exceptions when 
something exceptional happens. But in the Java world, exceptions are frequently 
used as a signaling mechanism. Indeed, Lucene does this also. We don't have a 
detailed map of what every exception means or when an exception is thrown in 
one piece of code, who (if anybody) will catch it and attempt to handle it. It 
is easy to look at the primary path and lose sight of all of the exceptions 
that are flying over our head that are being handled in specialized ways.
   
   The `ThreadJob` class handles re-throwing exceptions that are thrown in a 
background thread. I never did figure out how that works in Java, but this is 
the behavior that we are seeing in the Lucene tests and clearly what is 
expected by code.
   
   So, if we complete the primary path through to make something async, we have 
only started the job. We need to deal with the exceptions that are thrown. Our 
parallel tasks run, some of them fail, and we end up with an 
`AggregateException` type. What do we do with it?
   
   In every case, the answer will probably be something a bit different. Lucene 
often expects the exception to fly up the stack to a handler to deal with it. 
Or, depending on the exception type, it may just let it fly up to the original 
caller, or perhaps swallow and ignore it. To make things more complicated, 
there may be a different handler in a different part of the call stack 
expecting it depending on which type of exception it is. And many parts are 
pluggable, so the set of handlers and how they are expected to deal with the 
exception may be different in each pluggable piece. Certain questions need to 
be answered for every case:
   
   1. Which upstream handlers are expected to catch the exception from this 
piece of code?
   2. Are there potential exceptions that are thrown from a downstream piece of 
code? If so, what exception types?
   3. Is the exception required to be caught somewhere for the task that 
launched it to successfully complete?
   4. Is something upstream keeping track of which tasks succeeded and which 
failed?
   5. Does an exception handler reschedule the task at a later time?
   6. Is the exception allowed to propagate back to the caller (outside of 
Lucene)?
   7. Is the exception swallowed?
   8. Is this or any upstream piece of code pluggable or virtual, and what 
happens with the exception handlers when alternative implementation is plugged 
in? 
   9. If this or another pluggable piece may be swapped by the end user with a 
custom implementation, how are they expected to handle exceptions in the custom 
code? 
   
   One thing seems obvious to me: The existing exception handlers might not be 
compatible with the async code and may need to either be *duplicated* or 
*redesigned* to function when running async. This is particularly true when we 
have an `AggregateException` type and are expected to fire the code in multiple 
handlers because we have multiple exception types. We can't exactly throw 
multiple exceptions from a synchronous piece of code to get to the existing 
handlers.
   
   It took a month of work just to figure out how to (mostly) [correctly map 
each exception in Java to its counterpart in 
.NET](https://github.com/apache/lucenenet/pull/476), including changing the 
exception handlers to ignore the difference in inheritance between the 
exception types and re-map it to how Java inherits exceptions. That said, we 
[don't really know for certain how some exceptions should be handled if they 
are thrown](https://github.com/apache/lucenenet/pull/476).
   
   TBH - I have no idea how the Lucene team keeps track of the complicated 
dance of exceptions that are thrown and caught in this application. They must 
have a technical document somewhere that details it, but if not, this is a 
document that would need to be created and maintained on our end before we even 
started on such a task - at least for the part of the picture that we would 
need to see to maintain the async code that is added. This document would need 
to detail the entire stack when the application is running to fully understand 
where exceptions of specific types are thrown and where they are handled.
   
   There is a 
[TestIndexWriterExceptions](https://github.com/apache/lucenenet/blob/a654eb1f84e0f1e2bcb21fc8cab78169485cd651/src/Lucene.Net.Tests/Index/TestIndexWriterExceptions.cs)
 test that goes through some of what is expected, but it would need to be 
analyzed in combination with [our exception mapping 
logic](https://github.com/apache/lucenenet/blob/a654eb1f84e0f1e2bcb21fc8cab78169485cd651/src/Lucene.Net/Support/ExceptionHandling/ExceptionExtensions.cs)
 for it to make sense. There are also several tests that [analyze the stack 
trace](https://github.com/apache/lucenenet/blob/a654eb1f84e0f1e2bcb21fc8cab78169485cd651/src/Lucene.Net.TestFramework/Support/StackTraceHelper.cs)
 to ensure a specific caller exists in the call stack. Further evidence that 
the Lucene team is working from the perspective of the call stack at a high 
level when dealing with these.
   
   This is why I maintain my original position: This is a job for the Lucene 
designers to figure out, since they have the inside knowledge of what each 
exception type means and who is supposed to handle it. And this is taken into 
consideration when new features are added to Lucene.
   
   If we add async code on our end and are expected to maintain it across 
releases, it will be a tremendous amount of work. If we just add async 
functionality without creating a plan on how to upgrade it (including any 
exception logic that changes in future versions of Lucene), we may have no 
choice but to remove it since it will be blocking the effort to upgrade. Then 
not only is it a huge amount of time spent adding it to 4.8.0, it is a huge 
amount of time wasted because it will be gone from all future versions. 
Fortunately, we can see into the future versions of Lucene, so we can at least 
mitigate some of this.
   
   But I wouldn't consider a contribution like this without a commitment from 
someone to help to maintain it over the long run (dealing with future bug 
reports, incompatibilities, upgrading, and creating detailed technical 
documentation so others can understand it). Otherwise, it will just be 
relegated to the same fate as 
[TaskMergeScheduler](https://github.com/apache/lucenenet/issues/354): 
fundamentally broken, deprecated and slated for deletion. And a total waste of 
everyone's time.
   
   If Lucene adds async support, then it will always be something that will be 
considered as part of the entire application and something that is more 
straightforward to port and debug. Most importantly, it is not something we 
have to document and maintain ourselves.
   
   Of course, the first step is to [start a conversation with the Lucene 
team](https://lucene.apache.org/core/discussion.html) to determine whether this 
is something they are willing to do. Or at least willing to accept as a 
contribution that they will then maintain.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [lucenenet] NightOwl888 commented on issue #763: Support async task

Reply via email to