[
https://issues.apache.org/jira/browse/LUCENE-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553151#comment-15553151
]
Michael McCandless commented on LUCENE-7477:
--------------------------------------------
bq. Oh, it's really confusing how OfflineSorter is shaped right now, in
particular temporary file creation (the requirement to add checksum manually),
file deletion and cleanup and byte sequence readers/writers. There has to be a
way to do it better.
Maybe open a pre-cursor issue and improve things there, so this change is
easier?
> ExternalRefSorter should use OfflineSorter's actual writer for writing the
> input file
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-7477
> URL: https://issues.apache.org/jira/browse/LUCENE-7477
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Minor
> Fix For: 6.x, master (7.0)
>
> Attachments: LUCENE-7477.patch
>
>
> Consider this constructor in ExternalRefSorter:
> {code}
> public ExternalRefSorter(OfflineSorter sorter) throws IOException {
> this.sorter = sorter;
> this.input =
> sorter.getDirectory().createTempOutput(sorter.getTempFileNamePrefix(),
> "RefSorterRaw", IOContext.DEFAULT);
> this.writer = new OfflineSorter.ByteSequencesWriter(this.input);
> }
> {code}
> The problem with it is that the writer for the initial input file is written
> with the default {{OfflineSorter.ByteSequencesWriter}}, but the instance of
> {{OfflineSorter}} may be unable to read it if it overrides {{getReader}} to
> use something else than the default.
> While this works now, it should be cleaned up (I think). It'd be probably
> ideal to allow {{OfflineSorter}} to generate its own temporary file and just
> return the ByteSequencesWriter it chooses to use, so the above snippet would
> read:
> {code}
> public ExternalRefSorter(OfflineSorter sorter) throws IOException {
> this.sorter = sorter;
> this.writer = sorter.newUnsortedPartition();
> }
> {code}
> This could be also extended so that {{OfflineSorter}} is in charge of
> managing its own (sorted and unsorted) partitions. Then {{sort(String file)}}
> would simply become {{ByteSequenceIterator sort()}} (or even
> {{Stream<BytesRef> sort()}} as Stream is conveniently {{AutoCloseable}}). If
> we made {{OfflineSorter}} implement {{Closeable}} it could also take care of
> cleaning up any resources it opens in the directory we pass to it. An
> additional bonus would be the ability to dodge the final internal merge(1) --
> if we manage sorted and unsorted partitions then there are open possibilities
> of returning an iterator that dynamically merges from multiple partitions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]