[ 
https://issues.apache.org/jira/browse/VFS-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354474#comment-17354474
 ] 

Claus Stadler edited comment on VFS-805 at 5/31/21, 2:32 PM:
-------------------------------------------------------------

Overriding the close method in the DataInputStream returned by 
Http(4)RandomAccessContent such that the http response *directly* is closed 
(rather than the content) seems to fix the issue. Its based on [this 
stackoverflow 
answer|https://stackoverflow.com/questions/40947622/java-interrupt-inputstream-without-close-method]:

>> Solution: Instead of calling res.getEntity().getContent().close(), try 
>> res.close() or req.abort()

 
 The DataInputStream created in [Http4RandomAccessContent 
|https://github.com/apache/commons-vfs/blob/ca5a27dab0aaef84f9cf5e10debfa5827f2a873f/commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/http4/Http4RandomAccessContent.java#L84]needs
 to be extended with this snippet:
{code:java}
  @Override
   public void close() throws IOException {
    ((Closeable)httpResponse).close();
}
{code}
The slightly ugly part is that httpResponse itself provides no close method - 
but the returned instance is a subclass of CloseableHttpResponse which extends 
Closeable.

Updated test example:
{code:java}
// Seek randomly to arbitrary positions within a given range and fill a byte 
buffer
public static void mainVfsHttpTest(String[] args) throws Exception {
  String url = "http4://localhost/webdav/testfile-2gb.txt";
  FileSystemManager fsManager = VFS.getManager();

  Random rand = new Random();
  try (FileObject file = fsManager.resolveFile(url)) {          
    try (RandomAccessContent r = 
file.getContent().getRandomAccessContent(RandomAccessMode.READ)) {

      for (int i = 0; i < 1000; ++i) {
        long pos = rand.nextInt(1000000000);
        StopWatch sw = StopWatch.createStarted(); 
        r.seek(pos);
        byte[] bytes = new byte[100];
        r.readFully(bytes);
        System.out.println("Read at " + pos + " took " + 
sw.getTime(TimeUnit.MILLISECONDS));
        // System.out.println(new String(bytes));                               
        
      }
    }
  }
  System.out.println("Done");
}
{code}
 
 Output shows that seeking and reading works instantly:
{code:java}
 Read at 447632760 took 1 ms
 Read at 18244737 took 1 ms
 Read at 147992025 took 0 ms
 Read at 751592604 took 0 ms
...
{code}


was (Author: aklakan):
Overriding the close method in the DataInputStream returned by 
Http(4)RandomAccessContent such that the http response *directly* is closed 
(rather than the content) seems to fix the issue. Its based on [this 
stackoverflow 
answer|https://stackoverflow.com/questions/40947622/java-interrupt-inputstream-without-close-method]:

>> Solution: Instead of calling res.getEntity().getContent().close(), try 
>> res.close() or req.abort()

 
 The DataInputStream created in 
[Http4RandomAccessContent|https://github.com/apache/commons-vfs/blob/ca5a27dab0aaef84f9cf5e10debfa5827f2a873f/commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/http4/Http4RandomAccessContent.java#L84]
  
 needs to be extended with this snippet:
{code:java}
  @Override
   public void close() throws IOException {
    ((Closeable)httpResponse).close();
}
{code}
The slightly ugly part is that httpResponse itself provides no close method - 
but the returned instance is a subclass of CloseableHttpResponse which extends 
Closeable.

Updated test example:
{code:java}
// Seek randomly to arbitrary positions within a given range and fill a byte 
buffer
public static void mainVfsHttpTest(String[] args) throws Exception {
  String url = "http4://localhost/webdav/testfile-2gb.txt";
  FileSystemManager fsManager = VFS.getManager();

  Random rand = new Random();
  try (FileObject file = fsManager.resolveFile(url)) {          
    try (RandomAccessContent r = 
file.getContent().getRandomAccessContent(RandomAccessMode.READ)) {

      for (int i = 0; i < 1000; ++i) {
        long pos = rand.nextInt(1000000000);
        StopWatch sw = StopWatch.createStarted(); 
        r.seek(pos);
        byte[] bytes = new byte[100];
        r.readFully(bytes);
        System.out.println("Read at " + pos + " took " + 
sw.getTime(TimeUnit.MILLISECONDS));
        // System.out.println(new String(bytes));                               
        
      }
    }
  }
  System.out.println("Done");
}
{code}
 
 Output shows that seeking and reading works instantly:
{code:java}
 Read at 447632760 took 1 ms
 Read at 18244737 took 1 ms
 Read at 147992025 took 0 ms
 Read at 751592604 took 0 ms
...
{code}

> HTTP seek always exhausts response
> ----------------------------------
>
>                 Key: VFS-805
>                 URL: https://issues.apache.org/jira/browse/VFS-805
>             Project: Commons VFS
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Claus Stadler
>            Priority: Major
>
> Seeking on an HTTP resource always downloads ALL content if a Content-Length 
> header is present. The problem is that seeking closes the current input 
> stream which eventually ends up in ContentLengthInputStream.close() of the 
> (ancient) http client library.
>  
> To be clear, the problem is actually not with the seek itself, but with the 
> underlying close implementation that always exhausts the HTTP response body. 
> See the example below.
>  
> My use case is to perform binary search on sorted datasets on the Web (RDF 
> data in sorted ntriple syntax) - the binary search works locally and *in 
> principle* works on HTTP resources abstracted with VFS2, but the seek 
> implementation that downloads *ALL* data (in my case several GBs) 
> unfortunately defeats the purpose :(
>  
> From org.apache.commons.httpclient.ContentLengthInputStream 
> (commons-httpclient-3.1):
> {code:java}
>     public void close() throws IOException {
>         if (!closed) {
>             try {
>                 ChunkedInputStream.exhaustInputStream(this);
>             } finally {
>                 // close after above so that we don't throw an exception 
> trying
>                 // to read after closed!
>                 closed = true;
>             }
>         }
>     }
> {code}
> Example:
> {code:java}
>       public static void main(String[] args) throws Exception {
>               String url = "http://localhost/large-file-2gb.txt";;
>               FileSystemManager fsManager = VFS.getManager();
>               
>               try (FileObject file = fsManager.resolveFile(url)) {    
>                       try (RandomAccessContent r = 
> file.getContent().getRandomAccessContent(RandomAccessMode.READ)) {
>                               
>                               StopWatch sw1 = StopWatch.createStarted();
>                               r.seek(20);
>                               System.out.println("Initial seek: " + 
> sw1.getTime(TimeUnit.MILLISECONDS));
>                               StopWatch sw2 = StopWatch.createStarted();
>                               byte[] bytes = new byte[100];
>                               r.readFully(bytes);
>                               System.out.println("Read: " + 
> sw2.getTime(TimeUnit.MILLISECONDS));
>                               
>                               StopWatch sw3 = StopWatch.createStarted();
>                               r.seek(100);
>                               System.out.println("Subsequent seek: " + 
> sw3.getTime(TimeUnit.MILLISECONDS));
>                       }
>               }
>               System.out.println("Done");
>       }
> {code}
> Output (times in milliseconds):
> {code}
> Initial seek: 0
> Read: 4
> Subsequent seek: 2538
> Done
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to