[ 
https://issues.apache.org/jira/browse/LUCENE-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-7477:
--------------------------------
    Attachment: LUCENE-7477.patch

Oh, it's really confusing how OfflineSorter is shaped right now, in particular 
temporary file creation (the requirement to add checksum manually), file 
deletion and cleanup and byte sequence readers/writers. There has to be a way 
to do it better.

I attach an overview of what I was thinking about, but I stopped half-way. This 
is very simple and not complete because there are several hacks that currently 
make use of OfflineSorter's capability to read-write arbitrary files 
(BKDWriter).


> ExternalRefSorter should use OfflineSorter's actual writer for writing the 
> input file
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7477
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 6.x, master (7.0)
>
>         Attachments: LUCENE-7477.patch
>
>
> Consider this constructor in ExternalRefSorter:
> {code}
>   public ExternalRefSorter(OfflineSorter sorter) throws IOException {
>     this.sorter = sorter;
>     this.input = 
> sorter.getDirectory().createTempOutput(sorter.getTempFileNamePrefix(), 
> "RefSorterRaw", IOContext.DEFAULT);
>     this.writer = new OfflineSorter.ByteSequencesWriter(this.input);
>   }
> {code}
> The problem with it is that the writer for the initial input file is written 
> with the default {{OfflineSorter.ByteSequencesWriter}}, but the instance of 
> {{OfflineSorter}} may be unable to read it if it overrides {{getReader}} to 
> use something else than the default.
> While this works now, it should be cleaned up (I think). It'd be probably 
> ideal to allow {{OfflineSorter}} to generate its own temporary file and just 
> return the ByteSequencesWriter it chooses to use, so the above snippet would 
> read:
> {code}
>   public ExternalRefSorter(OfflineSorter sorter) throws IOException {
>     this.sorter = sorter;
>     this.writer = sorter.newUnsortedPartition();
>   }
> {code}
> This could be also extended so that {{OfflineSorter}} is in charge of 
> managing its own (sorted and unsorted) partitions. Then {{sort(String file)}} 
> would simply become {{ByteSequenceIterator sort()}} (or even 
> {{Stream<BytesRef> sort()}} as Stream is conveniently {{AutoCloseable}}). If 
> we made {{OfflineSorter}} implement {{Closeable}} it could also take care of 
> cleaning up any resources it opens in the directory we pass to it. An 
> additional bonus would be the ability to dodge the final internal merge(1) -- 
> if we manage sorted and unsorted partitions then there are open possibilities 
> of returning an iterator that dynamically merges from multiple partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to