[ 
https://issues.apache.org/jira/browse/ACCUMULO-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ACCUMULO-4744:
-------------------------------------
    Labels: pull-request-available  (was: )

> Using RFile API with cache and multiple files hides data
> --------------------------------------------------------
>
>                 Key: ACCUMULO-4744
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4744
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.8.0, 1.8.1
>            Reporter: Keith Turner
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.8.2
>
>
> Noticed this bug in source code while working on ACCUMULO-4641.  When using 
> the RFile API introduced in 1.8 to read from multiple files with cache 
> enabled, not all data may be seen.  This happens because internally the code 
> gives all input sources the same cache id.  Therefore index and data blocks 
> from multiple files collide in the cache.
> This bug does not happen when reading data through tserver, only the RFile 
> API.
> {code:java}
>   Scanner scanner =
>        RFile.newScanner()
>            .from(file1, file2, file3)   //multiple input files
>            .withFileSystem(localFs)
>            .withIndexCache(1000000)   //enabled cache 
>            .withDataCache(10000000)  //enabled cache
>            .build();
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to