[ https://issues.apache.org/jira/browse/ACCUMULO-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ACCUMULO-4744: ------------------------------------- Labels: pull-request-available (was: ) > Using RFile API with cache and multiple files hides data > -------------------------------------------------------- > > Key: ACCUMULO-4744 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4744 > Project: Accumulo > Issue Type: Bug > Affects Versions: 1.8.0, 1.8.1 > Reporter: Keith Turner > Priority: Critical > Labels: pull-request-available > Fix For: 1.8.2 > > > Noticed this bug in source code while working on ACCUMULO-4641. When using > the RFile API introduced in 1.8 to read from multiple files with cache > enabled, not all data may be seen. This happens because internally the code > gives all input sources the same cache id. Therefore index and data blocks > from multiple files collide in the cache. > This bug does not happen when reading data through tserver, only the RFile > API. > {code:java} > Scanner scanner = > RFile.newScanner() > .from(file1, file2, file3) //multiple input files > .withFileSystem(localFs) > .withIndexCache(1000000) //enabled cache > .withDataCache(10000000) //enabled cache > .build(); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)