Dawid Weiss created LUCENE-7477:
-----------------------------------
Summary: ExternalRefSorter should use OfflineSorter's actual
writer for writing the input file
Key: LUCENE-7477
URL: https://issues.apache.org/jira/browse/LUCENE-7477
Project: Lucene - Core
Issue Type: Bug
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
Fix For: 6.x, master (7.0)
Consider this constructor in ExternalRefSorter:
{code}
public ExternalRefSorter(OfflineSorter sorter) throws IOException {
this.sorter = sorter;
this.input =
sorter.getDirectory().createTempOutput(sorter.getTempFileNamePrefix(),
"RefSorterRaw", IOContext.DEFAULT);
this.writer = new OfflineSorter.ByteSequencesWriter(this.input);
}
{code}
The problem with it is that the writer for the initial input file is written
with the default {{OfflineSorter.ByteSequencesWriter}}, but the instance of
{{OfflineSorter}} may be unable to read it if it overrides {{getReader}} to use
something else than the default.
While this works now, it should be cleaned up (I think). It'd be probably ideal
to allow {{OfflineSorter}} to generate its own temporary file and just return
the ByteSequencesWriter it chooses to use, so the above snippet would read:
{code}
public ExternalRefSorter(OfflineSorter sorter) throws IOException {
this.sorter = sorter;
this.writer = sorter.newUnsortedPartition();
}
{code}
This could be also extended so that {{OfflineSorter}} is in charge of managing
its own (sorted and unsorted) partitions. Then {{sort(String file)}} would
simply become {{ByteSequenceIterator sort()}} (or even {{Stream<BytesRef>
sort()}} as Stream is conveniently {{AutoCloseable}}). If we made
{{OfflineSorter}} implement {{Closeable}} it could also take care of cleaning
up any resources it opens in the directory we pass to it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]