[ 
https://issues.apache.org/jira/browse/LUCENE-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-7477:
--------------------------------
    Description: 
Consider this constructor in ExternalRefSorter:
{code}
  public ExternalRefSorter(OfflineSorter sorter) throws IOException {
    this.sorter = sorter;
    this.input = 
sorter.getDirectory().createTempOutput(sorter.getTempFileNamePrefix(), 
"RefSorterRaw", IOContext.DEFAULT);
    this.writer = new OfflineSorter.ByteSequencesWriter(this.input);
  }
{code}

The problem with it is that the writer for the initial input file is written 
with the default {{OfflineSorter.ByteSequencesWriter}}, but the instance of 
{{OfflineSorter}} may be unable to read it if it overrides {{getReader}} to use 
something else than the default.

While this works now, it should be cleaned up (I think). It'd be probably ideal 
to allow {{OfflineSorter}} to generate its own temporary file and just return 
the ByteSequencesWriter it chooses to use, so the above snippet would read:

{code}
  public ExternalRefSorter(OfflineSorter sorter) throws IOException {
    this.sorter = sorter;
    this.writer = sorter.newUnsortedPartition();
  }
{code}

This could be also extended so that {{OfflineSorter}} is in charge of managing 
its own (sorted and unsorted) partitions. Then {{sort(String file)}} would 
simply become {{ByteSequenceIterator sort()}} (or even {{Stream<BytesRef> 
sort()}} as Stream is conveniently {{AutoCloseable}}). If we made 
{{OfflineSorter}} implement {{Closeable}} it could also take care of cleaning 
up any resources it opens in the directory we pass to it. An additional bonus 
would be the ability to dodge the final internal merge(1) -- if we manage 
sorted and unsorted partitions then there are open possibilities of returning 
an iterator that dynamically merges from multiple partitions.


  was:
Consider this constructor in ExternalRefSorter:
{code}
  public ExternalRefSorter(OfflineSorter sorter) throws IOException {
    this.sorter = sorter;
    this.input = 
sorter.getDirectory().createTempOutput(sorter.getTempFileNamePrefix(), 
"RefSorterRaw", IOContext.DEFAULT);
    this.writer = new OfflineSorter.ByteSequencesWriter(this.input);
  }
{code}

The problem with it is that the writer for the initial input file is written 
with the default {{OfflineSorter.ByteSequencesWriter}}, but the instance of 
{{OfflineSorter}} may be unable to read it if it overrides {{getReader}} to use 
something else than the default.

While this works now, it should be cleaned up (I think). It'd be probably ideal 
to allow {{OfflineSorter}} to generate its own temporary file and just return 
the ByteSequencesWriter it chooses to use, so the above snippet would read:

{code}
  public ExternalRefSorter(OfflineSorter sorter) throws IOException {
    this.sorter = sorter;
    this.writer = sorter.newUnsortedPartition();
  }
{code}

This could be also extended so that {{OfflineSorter}} is in charge of managing 
its own (sorted and unsorted) partitions. Then {{sort(String file)}} would 
simply become {{ByteSequenceIterator sort()}} (or even {{Stream<BytesRef> 
sort()}} as Stream is conveniently {{AutoCloseable}}). If we made 
{{OfflineSorter}} implement {{Closeable}} it could also take care of cleaning 
up any resources it opens in the directory we pass to it.



> ExternalRefSorter should use OfflineSorter's actual writer for writing the 
> input file
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7477
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 6.x, master (7.0)
>
>
> Consider this constructor in ExternalRefSorter:
> {code}
>   public ExternalRefSorter(OfflineSorter sorter) throws IOException {
>     this.sorter = sorter;
>     this.input = 
> sorter.getDirectory().createTempOutput(sorter.getTempFileNamePrefix(), 
> "RefSorterRaw", IOContext.DEFAULT);
>     this.writer = new OfflineSorter.ByteSequencesWriter(this.input);
>   }
> {code}
> The problem with it is that the writer for the initial input file is written 
> with the default {{OfflineSorter.ByteSequencesWriter}}, but the instance of 
> {{OfflineSorter}} may be unable to read it if it overrides {{getReader}} to 
> use something else than the default.
> While this works now, it should be cleaned up (I think). It'd be probably 
> ideal to allow {{OfflineSorter}} to generate its own temporary file and just 
> return the ByteSequencesWriter it chooses to use, so the above snippet would 
> read:
> {code}
>   public ExternalRefSorter(OfflineSorter sorter) throws IOException {
>     this.sorter = sorter;
>     this.writer = sorter.newUnsortedPartition();
>   }
> {code}
> This could be also extended so that {{OfflineSorter}} is in charge of 
> managing its own (sorted and unsorted) partitions. Then {{sort(String file)}} 
> would simply become {{ByteSequenceIterator sort()}} (or even 
> {{Stream<BytesRef> sort()}} as Stream is conveniently {{AutoCloseable}}). If 
> we made {{OfflineSorter}} implement {{Closeable}} it could also take care of 
> cleaning up any resources it opens in the directory we pass to it. An 
> additional bonus would be the ability to dodge the final internal merge(1) -- 
> if we manage sorted and unsorted partitions then there are open possibilities 
> of returning an iterator that dynamically merges from multiple partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to