[ 
https://issues.apache.org/jira/browse/LUCENE-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553151#comment-15553151
 ] 

Michael McCandless commented on LUCENE-7477:
--------------------------------------------

bq. Oh, it's really confusing how OfflineSorter is shaped right now, in 
particular temporary file creation (the requirement to add checksum manually), 
file deletion and cleanup and byte sequence readers/writers. There has to be a 
way to do it better.

Maybe open a pre-cursor issue and improve things there, so this change is 
easier?

> ExternalRefSorter should use OfflineSorter's actual writer for writing the 
> input file
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7477
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 6.x, master (7.0)
>
>         Attachments: LUCENE-7477.patch
>
>
> Consider this constructor in ExternalRefSorter:
> {code}
>   public ExternalRefSorter(OfflineSorter sorter) throws IOException {
>     this.sorter = sorter;
>     this.input = 
> sorter.getDirectory().createTempOutput(sorter.getTempFileNamePrefix(), 
> "RefSorterRaw", IOContext.DEFAULT);
>     this.writer = new OfflineSorter.ByteSequencesWriter(this.input);
>   }
> {code}
> The problem with it is that the writer for the initial input file is written 
> with the default {{OfflineSorter.ByteSequencesWriter}}, but the instance of 
> {{OfflineSorter}} may be unable to read it if it overrides {{getReader}} to 
> use something else than the default.
> While this works now, it should be cleaned up (I think). It'd be probably 
> ideal to allow {{OfflineSorter}} to generate its own temporary file and just 
> return the ByteSequencesWriter it chooses to use, so the above snippet would 
> read:
> {code}
>   public ExternalRefSorter(OfflineSorter sorter) throws IOException {
>     this.sorter = sorter;
>     this.writer = sorter.newUnsortedPartition();
>   }
> {code}
> This could be also extended so that {{OfflineSorter}} is in charge of 
> managing its own (sorted and unsorted) partitions. Then {{sort(String file)}} 
> would simply become {{ByteSequenceIterator sort()}} (or even 
> {{Stream<BytesRef> sort()}} as Stream is conveniently {{AutoCloseable}}). If 
> we made {{OfflineSorter}} implement {{Closeable}} it could also take care of 
> cleaning up any resources it opens in the directory we pass to it. An 
> additional bonus would be the ability to dodge the final internal merge(1) -- 
> if we manage sorted and unsorted partitions then there are open possibilities 
> of returning an iterator that dynamically merges from multiple partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to