NightOwl888 commented on issue #763: URL: https://github.com/apache/lucenenet/issues/763#issuecomment-1323652154
> @jeme > > Looks like `BinaryReaderDataInput` was used only in the [constructor of `OfflineSorter.ByteSequencesReader`](https://github.com/apache/lucene/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/util/OfflineSorter.java#L501-L503) to replace the [`DataInputStream`](https://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html) for .NET. But I see why you went down this rabbit hole. The constructor wraps `FileStream` in a `BinaryReader` when `BinaryReaderDataInput` actually could be swapped for `InputStreamDataInput` and use the `FileStream` directly. We can then delete the `BinaryReaderDataInput` from the codebase because it is the only place it is used. > > Same goes for `BinaryWriterDataOutput`. > Okay, I see where we went wrong for `ByteSequencesReader` and `ByteSequencesWriter`. Their constructors accept a reference to `java.io.DataInput` and `java.io.DataOutput` interfaces. During the port, we transposed these with Lucene's `DataInput` and `DataOutput` abstract classes. So, we are plugging this together wrong. Although we have ported over the `DataInputStream` and `DataOutputStream`, according to the J2N docs, it is better to use the `BinaryReader` and `BinaryWriter` if the modified UTF-8 format used in Java is not required. And indeed, we don't use that format here. `BinaryReader` and `BinaryWriter` don't have any interfaces, but both of them are not sealed and have virtual members (except for a few odd ones). So, I am not seeing too much of an issue with changing the constructors to accept `BinaryReader` and `BinaryWriter` instead of `DataInput` and `DataOutput`. > `OfflineSorter` also has another problem. It uses [FileSupport.CreateTempFile()](https://github.com/apache/lucenenet/blob/4e49612d6867194440694b77db95fd0ed756c9a9/src/Lucene.Net/Util/OfflineSorter.cs#L305) to get a `FileInfo` object. Then attempts to open the file afterward in `MergePartitions`. There is a non-zero chance on Windows that the operating system hasn't closed the handle on the file and we get a "file in use by another process" error. > > I was considering using some low-level APIs to create a file handle and wrap it in a subclass of [System.IO.FileSystemInfo](https://learn.microsoft.com/en-us/dotnet/api/system.io.filesysteminfo?view=net-7.0) (which `FileInfo` subclasses) and returning that from `FileSupport.CreateTempFile()`. But now that I look again, maybe we should just change `FileSupport.CreateTempFile()` to allow passing in the flags for creating and returning the `FileStream` instance that is used to create the file. It looks like every place that is using it immediately opens a stream afterwards, anyway. `OfflineSorter` could be changed to accept streams instead of `FileInfo` objects in its `Sort` method. > > Needs some more analysis. Looks like some places actually need the name of the file, which `FileStream` doesn't provide access to. Looks like I was wrong - `FileStream` does have a `Name` property, so we have access to the underlying filename. There are a couple of callers that seem to fit better with using the `FileInfo` and they are both in the test framework. All of the other callers use it in conjunction with `OfflineSorter`. So, creating overloads named `CreateTempFileAsStream()` seem to be the way to go here. The existing overloads can easily call `CreateTempFileAsStream()` and return a `FileInfo` so we can reuse logic. It gets a bit mangled in `OfflineSorter` because it is set up to delete its files, but we can specify in the docs when calling `Sort()` that it should be passed a `FileStream` with the [FileOptions.DeleteOnClose](https://learn.microsoft.com/en-us/dotnet/api/system.io.fileoptions?view=net-7.0#fields) option set or else delete it explicitly. It will have to accept `FileStream` rather than `Stream`, but since it was only designed to work with files, I don't see that as an issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org