[ https://issues.apache.org/jira/browse/VFS-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354474#comment-17354474 ]
Claus Stadler edited comment on VFS-805 at 5/31/21, 2:32 PM: ------------------------------------------------------------- Overriding the close method in the DataInputStream returned by Http(4)RandomAccessContent such that the http response *directly* is closed (rather than the content) seems to fix the issue. Its based on [this stackoverflow answer|https://stackoverflow.com/questions/40947622/java-interrupt-inputstream-without-close-method]: >> Solution: Instead of calling res.getEntity().getContent().close(), try >> res.close() or req.abort() The DataInputStream created in [Http4RandomAccessContent |https://github.com/apache/commons-vfs/blob/ca5a27dab0aaef84f9cf5e10debfa5827f2a873f/commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/http4/Http4RandomAccessContent.java#L84]needs to be extended with this snippet: {code:java} @Override public void close() throws IOException { ((Closeable)httpResponse).close(); } {code} The slightly ugly part is that httpResponse itself provides no close method - but the returned instance is a subclass of CloseableHttpResponse which extends Closeable. Updated test example: {code:java} // Seek randomly to arbitrary positions within a given range and fill a byte buffer public static void mainVfsHttpTest(String[] args) throws Exception { String url = "http4://localhost/webdav/testfile-2gb.txt"; FileSystemManager fsManager = VFS.getManager(); Random rand = new Random(); try (FileObject file = fsManager.resolveFile(url)) { try (RandomAccessContent r = file.getContent().getRandomAccessContent(RandomAccessMode.READ)) { for (int i = 0; i < 1000; ++i) { long pos = rand.nextInt(1000000000); StopWatch sw = StopWatch.createStarted(); r.seek(pos); byte[] bytes = new byte[100]; r.readFully(bytes); System.out.println("Read at " + pos + " took " + sw.getTime(TimeUnit.MILLISECONDS)); // System.out.println(new String(bytes)); } } } System.out.println("Done"); } {code} Output shows that seeking and reading works instantly: {code:java} Read at 447632760 took 1 ms Read at 18244737 took 1 ms Read at 147992025 took 0 ms Read at 751592604 took 0 ms ... {code} was (Author: aklakan): Overriding the close method in the DataInputStream returned by Http(4)RandomAccessContent such that the http response *directly* is closed (rather than the content) seems to fix the issue. Its based on [this stackoverflow answer|https://stackoverflow.com/questions/40947622/java-interrupt-inputstream-without-close-method]: >> Solution: Instead of calling res.getEntity().getContent().close(), try >> res.close() or req.abort() The DataInputStream created in [Http4RandomAccessContent|https://github.com/apache/commons-vfs/blob/ca5a27dab0aaef84f9cf5e10debfa5827f2a873f/commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/http4/Http4RandomAccessContent.java#L84] needs to be extended with this snippet: {code:java} @Override public void close() throws IOException { ((Closeable)httpResponse).close(); } {code} The slightly ugly part is that httpResponse itself provides no close method - but the returned instance is a subclass of CloseableHttpResponse which extends Closeable. Updated test example: {code:java} // Seek randomly to arbitrary positions within a given range and fill a byte buffer public static void mainVfsHttpTest(String[] args) throws Exception { String url = "http4://localhost/webdav/testfile-2gb.txt"; FileSystemManager fsManager = VFS.getManager(); Random rand = new Random(); try (FileObject file = fsManager.resolveFile(url)) { try (RandomAccessContent r = file.getContent().getRandomAccessContent(RandomAccessMode.READ)) { for (int i = 0; i < 1000; ++i) { long pos = rand.nextInt(1000000000); StopWatch sw = StopWatch.createStarted(); r.seek(pos); byte[] bytes = new byte[100]; r.readFully(bytes); System.out.println("Read at " + pos + " took " + sw.getTime(TimeUnit.MILLISECONDS)); // System.out.println(new String(bytes)); } } } System.out.println("Done"); } {code} Output shows that seeking and reading works instantly: {code:java} Read at 447632760 took 1 ms Read at 18244737 took 1 ms Read at 147992025 took 0 ms Read at 751592604 took 0 ms ... {code} > HTTP seek always exhausts response > ---------------------------------- > > Key: VFS-805 > URL: https://issues.apache.org/jira/browse/VFS-805 > Project: Commons VFS > Issue Type: Bug > Affects Versions: 2.8.0 > Reporter: Claus Stadler > Priority: Major > > Seeking on an HTTP resource always downloads ALL content if a Content-Length > header is present. The problem is that seeking closes the current input > stream which eventually ends up in ContentLengthInputStream.close() of the > (ancient) http client library. > > To be clear, the problem is actually not with the seek itself, but with the > underlying close implementation that always exhausts the HTTP response body. > See the example below. > > My use case is to perform binary search on sorted datasets on the Web (RDF > data in sorted ntriple syntax) - the binary search works locally and *in > principle* works on HTTP resources abstracted with VFS2, but the seek > implementation that downloads *ALL* data (in my case several GBs) > unfortunately defeats the purpose :( > > From org.apache.commons.httpclient.ContentLengthInputStream > (commons-httpclient-3.1): > {code:java} > public void close() throws IOException { > if (!closed) { > try { > ChunkedInputStream.exhaustInputStream(this); > } finally { > // close after above so that we don't throw an exception > trying > // to read after closed! > closed = true; > } > } > } > {code} > Example: > {code:java} > public static void main(String[] args) throws Exception { > String url = "http://localhost/large-file-2gb.txt"; > FileSystemManager fsManager = VFS.getManager(); > > try (FileObject file = fsManager.resolveFile(url)) { > try (RandomAccessContent r = > file.getContent().getRandomAccessContent(RandomAccessMode.READ)) { > > StopWatch sw1 = StopWatch.createStarted(); > r.seek(20); > System.out.println("Initial seek: " + > sw1.getTime(TimeUnit.MILLISECONDS)); > StopWatch sw2 = StopWatch.createStarted(); > byte[] bytes = new byte[100]; > r.readFully(bytes); > System.out.println("Read: " + > sw2.getTime(TimeUnit.MILLISECONDS)); > > StopWatch sw3 = StopWatch.createStarted(); > r.seek(100); > System.out.println("Subsequent seek: " + > sw3.getTime(TimeUnit.MILLISECONDS)); > } > } > System.out.println("Done"); > } > {code} > Output (times in milliseconds): > {code} > Initial seek: 0 > Read: 4 > Subsequent seek: 2538 > Done > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)