NightOwl888 commented on issue #763:
URL: https://github.com/apache/lucenenet/issues/763#issuecomment-1323652154

   > @jeme
   > 
   > Looks like `BinaryReaderDataInput` was used only in the [constructor of 
`OfflineSorter.ByteSequencesReader`](https://github.com/apache/lucene/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/util/OfflineSorter.java#L501-L503)
 to replace the 
[`DataInputStream`](https://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html)
 for .NET. But I see why you went down this rabbit hole. The constructor wraps 
`FileStream` in a `BinaryReader` when `BinaryReaderDataInput` actually could be 
swapped for `InputStreamDataInput` and use the `FileStream` directly. We can 
then delete the `BinaryReaderDataInput` from the codebase because it is the 
only place it is used.
   > 
   > Same goes for `BinaryWriterDataOutput`.
   > 
   
   
   Okay, I see where we went wrong for `ByteSequencesReader` and 
`ByteSequencesWriter`. Their constructors accept a reference to 
`java.io.DataInput` and `java.io.DataOutput` interfaces. During the port, we 
transposed these with Lucene's `DataInput` and `DataOutput` abstract classes. 
So, we are plugging this together wrong.
   
   Although we have ported over the `DataInputStream` and `DataOutputStream`, 
according to the J2N docs, it is better to use the `BinaryReader` and 
`BinaryWriter` if the modified UTF-8 format used in Java is not required. And 
indeed, we don't use that format here. `BinaryReader` and `BinaryWriter` don't 
have any interfaces, but both of them are not sealed and have virtual members 
(except for a few odd ones). So, I am not seeing too much of an issue with 
changing the constructors to accept `BinaryReader` and `BinaryWriter` instead 
of `DataInput` and `DataOutput`.
   
   
   > `OfflineSorter` also has another problem. It uses 
[FileSupport.CreateTempFile()](https://github.com/apache/lucenenet/blob/4e49612d6867194440694b77db95fd0ed756c9a9/src/Lucene.Net/Util/OfflineSorter.cs#L305)
 to get a `FileInfo` object. Then attempts to open the file afterward in 
`MergePartitions`. There is a non-zero chance on Windows that the operating 
system hasn't closed the handle on the file and we get a "file in use by 
another process" error.
   > 
   > I was considering using some low-level APIs to create a file handle and 
wrap it in a subclass of 
[System.IO.FileSystemInfo](https://learn.microsoft.com/en-us/dotnet/api/system.io.filesysteminfo?view=net-7.0)
 (which `FileInfo` subclasses) and returning that from 
`FileSupport.CreateTempFile()`. But now that I look again, maybe we should just 
change `FileSupport.CreateTempFile()` to allow passing in the flags for 
creating and returning the `FileStream` instance that is used to create the 
file. It looks like every place that is using it immediately opens a stream 
afterwards, anyway. `OfflineSorter` could be changed to accept streams instead 
of `FileInfo` objects in its `Sort` method.
   > 
   > Needs some more analysis. Looks like some places actually need the name of 
the file, which `FileStream` doesn't provide access to.
   
   Looks like I was wrong - `FileStream` does have a `Name` property, so we 
have access to the underlying filename. There are a couple of callers that seem 
to fit better with using the `FileInfo` and they are both in the test 
framework.  All of the other callers use it in conjunction with 
`OfflineSorter`. So, creating overloads named `CreateTempFileAsStream()` seem 
to be the way to go here. The existing overloads can easily call 
`CreateTempFileAsStream()` and return a `FileInfo` so we can reuse logic.
   
   It gets a bit mangled in `OfflineSorter` because it is set up to delete its 
files, but we can specify in the docs when calling `Sort()` that it should be 
passed a `FileStream` with the 
[FileOptions.DeleteOnClose](https://learn.microsoft.com/en-us/dotnet/api/system.io.fileoptions?view=net-7.0#fields)
 option set or else delete it explicitly. It will have to accept `FileStream` 
rather than `Stream`, but since it was only designed to work with files, I 
don't see that as an issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to