[ 
https://issues.apache.org/jira/browse/LUCENE-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554340#comment-15554340
 ] 

Dawid Weiss commented on LUCENE-7477:
-------------------------------------

Yeah, I think I'll start from a local (detached) implementation of 
OfflineSorter first and then see if I can refactor things around. Perhaps I'll 
need help with BKD, if so I'll ping you, Mike.

This issue is not a bug (the only subclass changing input/output encoding of 
byte refs is in BKD), but it's an improvement to make the code shorter and more 
legible.

> ExternalRefSorter should use OfflineSorter's actual writer for writing the 
> input file
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7477
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 6.x, master (7.0)
>
>         Attachments: LUCENE-7477.patch
>
>
> Consider this constructor in ExternalRefSorter:
> {code}
>   public ExternalRefSorter(OfflineSorter sorter) throws IOException {
>     this.sorter = sorter;
>     this.input = 
> sorter.getDirectory().createTempOutput(sorter.getTempFileNamePrefix(), 
> "RefSorterRaw", IOContext.DEFAULT);
>     this.writer = new OfflineSorter.ByteSequencesWriter(this.input);
>   }
> {code}
> The problem with it is that the writer for the initial input file is written 
> with the default {{OfflineSorter.ByteSequencesWriter}}, but the instance of 
> {{OfflineSorter}} may be unable to read it if it overrides {{getReader}} to 
> use something else than the default.
> While this works now, it should be cleaned up (I think). It'd be probably 
> ideal to allow {{OfflineSorter}} to generate its own temporary file and just 
> return the ByteSequencesWriter it chooses to use, so the above snippet would 
> read:
> {code}
>   public ExternalRefSorter(OfflineSorter sorter) throws IOException {
>     this.sorter = sorter;
>     this.writer = sorter.newUnsortedPartition();
>   }
> {code}
> This could be also extended so that {{OfflineSorter}} is in charge of 
> managing its own (sorted and unsorted) partitions. Then {{sort(String file)}} 
> would simply become {{ByteSequenceIterator sort()}} (or even 
> {{Stream<BytesRef> sort()}} as Stream is conveniently {{AutoCloseable}}). If 
> we made {{OfflineSorter}} implement {{Closeable}} it could also take care of 
> cleaning up any resources it opens in the directory we pass to it. An 
> additional bonus would be the ability to dodge the final internal merge(1) -- 
> if we manage sorted and unsorted partitions then there are open possibilities 
> of returning an iterator that dynamically merges from multiple partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to