[ 
https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894926#action_12894926
 ] 

Shai Erera commented on LUCENE-2574:
------------------------------------

Weird .. I've found out that if I disable this call:

      // flush any bytes in the input's buffer.
      numBytes -= fsInput.flushBuffer(this, numBytes);

from FSIndexOutput.copyBytes, then the test passes.

But I'm not yet sure why ... I've disabled everything else 
(IndexInput.copyBytes). It's as if this call copies bytes that should not have 
been copied. I've added this change because I thought there is a bug in the 
current copyBytes impl: it uses FileChannel to do the optimized copy, but since 
SimpleFSIndexInput extends BufferedIndexInput, there might be bytes read to its 
buffer, and not written yet ...

I've made the following changes:
FSIndexOutput.copyBytes:
{code}
      SimpleFSIndexInput fsInput = (SimpleFSIndexInput) input;

// change start
//      // flush any bytes in the input's buffer.
//      numBytes -= fsInput.flushBuffer(this, numBytes);
      
      // flush any bytes in our buffer
      flush();
// change end

      // do the optimized copy
      FileChannel in = fsInput.file.getChannel();
{code}

and rewrote BufferedIndexInput.flushBuffer:

{code}
    int toCopy = bufferLength - bufferPosition;
    if (toCopy < numBytes) {
      // We're copying the entire content of the buffer, so update accordingly.
      out.writeBytes(buffer, bufferPosition, toCopy);
      bufferPosition = bufferLength = 0;
      bufferStart += toCopy;
    } else {
      toCopy = (int) numBytes;
      // We are asked to copy less bytes than are available in the buffer.
      out.writeBytes(buffer, bufferPosition, toCopy);
      bufferPosition += toCopy;
    }
    return toCopy;
{code}

The test now fails on some other exception (not the assert in addRawDocuments). 
If I remove the call to flushBuffer, it passes. I still need to understand this.

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've 
> also optimized copyBytes recently. However, we're missing the opposite side 
> of the copy - from IndexInput to Output. I'd like to mimic the FileChannel 
> API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, 
> both sides can optimize the copy process, depending on the type of the 
> IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. 
> RAMInput/OutputStream can copy to/from the buffers directly, w/o going 
> through intermediate ones. Actually, for RAMIn/Out this might be a big win, 
> because it doesn't care about the type of IndexInput/Output given - it just 
> needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one 
> (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, 
> it will enable someone to do optimized copies between In/Out outside the 
> scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd 
> like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate 
> buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to