keith-turner commented on a change in pull request #492: fixes #467 cache file 
lengths
URL: https://github.com/apache/accumulo/pull/492#discussion_r189401839
 
 

 ##########
 File path: 
core/src/main/java/org/apache/accumulo/core/file/blockfile/impl/CachableBlockFile.java
 ##########
 @@ -281,17 +288,47 @@ public Reader(FileSystem fs, Path dataFile, 
Configuration conf, BlockCache data,
       this._bc = new BCFile.Reader(this, fsin, len, conf, 
accumuloConfiguration);
     }
 
+    private static long getFileLen(Cache<String,Long> fileLenCache, final 
FileSystem fs,
+        final Path path) throws IOException {
+      try {
+        return fileLenCache.get(path.getName(), new Callable<Long>() {
+          @Override
+          public Long call() throws Exception {
+            return fs.getFileStatus(path).getLen();
+          }
+        });
+      } catch (ExecutionException e) {
+        throw new IOException("Failed to get " + path + " len from cache ", e);
+      }
+    }
+
     private synchronized BCFile.Reader getBCFile(AccumuloConfiguration 
accumuloConfiguration)
         throws IOException {
       if (closed)
         throw new IllegalStateException("File " + fileName + " is closed");
 
       if (_bc == null) {
         // lazily open file if needed
-        Path path = new Path(fileName);
+        final Path path = new Path(fileName);
+
         RateLimitedInputStream fsIn = new 
RateLimitedInputStream(fs.open(path), this.readLimiter);
         fin = fsIn;
-        init(fsIn, fs.getFileStatus(path).getLen(), conf, 
accumuloConfiguration);
+
+        if (fileLenCache != null) {
+          try {
 
 Review comment:
   I tested replacing a larger file with a smaller file and visa versa.  Both 
worked.  I had to create lots of tablets (over max open files, because open 
files are cached) and turn off index caching to test this.  With index caching 
or open file caching, it would fail for reasons unrelated to this change.  So 
there are other pre-existing problems with replacing a file, but this new code 
works.  Below are the exceptions I saw in the log.
   
   When replacing a smaller file with a larger file, I saw the following.
   
   ```
   2018-05-18 17:30:47,023 [impl.CachableBlockFile] DEBUG: Failed to open 
hdfs://localhost:8020/accumulo/tables/2/t-00000br/A00001r5.rf, clearing file 
length cache and retrying
   java.io.IOException: Not a valid BCFile.
           at 
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Magic.readAndVerify(BCFile.java:1308)
           at 
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.<init>(BCFile.java:911)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.init(CachableBlockFile.java:288)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:319)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$100(CachableBlockFile.java:155)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:236)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:388)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:445)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:155)
           at 
org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1149)
           at 
org.apache.accumulo.core.file.rfile.RFileOperations.getReader(RFileOperations.java:50)
           at 
org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:65)
           at 
org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:95)
           at 
org.apache.accumulo.core.file.FileOperations$OpenReaderOperation.build(FileOperations.java:551)
           at 
org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:329)
           at 
org.apache.accumulo.tserver.FileManager.access$600(FileManager.java:63)
           at 
org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:516)
           at 
org.apache.accumulo.tserver.FileManager$ScanFileManager.openFileRefs(FileManager.java:501)
           at 
org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:526)
           at 
org.apache.accumulo.tserver.tablet.ScanDataSource.createIterator(ScanDataSource.java:176)
           at 
org.apache.accumulo.tserver.tablet.ScanDataSource.iterator(ScanDataSource.java:134)
           at 
org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:231)
           at 
org.apache.accumulo.tserver.tablet.Tablet.nextBatch(Tablet.java:798)
           at org.apache.accumulo.tserver.tablet.Scanner.read(Scanner.java:100)
           at 
org.apache.accumulo.tserver.scan.NextBatchTask.run(NextBatchTask.java:73)
           at 
org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at 
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
           at java.lang.Thread.run(Thread.java:748)
   ```
   
   When replacing a larger file with a smaller file,  I saw the following.
   
   ```
   2018-05-18 17:50:35,175 [impl.CachableBlockFile] DEBUG: Failed to open 
hdfs://localhost:8020/accumulo/tables/2/t-00000br/A00001r5.rf, clearing file 
length cache and retrying
   java.io.EOFException: Cannot seek after EOF
           at 
org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1516)
           at 
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
           at 
org.apache.accumulo.core.file.streams.RateLimitedInputStream.seek(RateLimitedInputStream.java:59)
           at 
org.apache.accumulo.core.file.streams.SeekableDataInputStream.seek(SeekableDataInputStream.java:36)
           at 
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.<init>(BCFile.java:909)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.init(CachableBlockFile.java:288)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:319)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$100(CachableBlockFile.java:155)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:236)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:388)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:445)
           at 
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:155)
           at 
org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:1149)
           at 
org.apache.accumulo.core.file.rfile.RFileOperations.getReader(RFileOperations.java:50)
           at 
org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:65)
           at 
org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:95)
           at 
org.apache.accumulo.core.file.FileOperations$OpenReaderOperation.build(FileOperations.java:551)
           at 
org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:329)
           at 
org.apache.accumulo.tserver.FileManager.access$600(FileManager.java:63)
           at 
org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:516)
           at 
org.apache.accumulo.tserver.FileManager$ScanFileManager.openFileRefs(FileManager.java:501)
           at 
org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:526)
           at 
org.apache.accumulo.tserver.tablet.ScanDataSource.createIterator(ScanDataSource.java:176)
           at 
org.apache.accumulo.tserver.tablet.ScanDataSource.iterator(ScanDataSource.java:134)
           at 
org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:231)
           at 
org.apache.accumulo.tserver.tablet.Tablet.nextBatch(Tablet.java:798)
           at org.apache.accumulo.tserver.tablet.Scanner.read(Scanner.java:100)
           at 
org.apache.accumulo.tserver.scan.NextBatchTask.run(NextBatchTask.java:73)
           at 
org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at 
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
           at java.lang.Thread.run(Thread.java:748)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to