[
https://issues.apache.org/jira/browse/ACCUMULO-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633571#comment-13633571
]
Eric Newton commented on ACCUMULO-1281:
---------------------------------------
The gc runs only every 15 minutes by default. We've had users flush their
!METADATA table as often as every 5 minutes. I have seen hundreds of thousands
of files removed during a GC cycle. Unless this is compacted, they just keep
building up in memory.
If the cluster is using live ingest, the !METADATA table tends to flush because
of the number of WAL entries it has.
The use of in-memory compactions (ACCUMULO-519) using a ratio of delete or
update records to trigger a flush would be a more intelligent approach, but
that takes more than 4 lines of code.
> flush the METADATA table after GC
> ---------------------------------
>
> Key: ACCUMULO-1281
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1281
> Project: Accumulo
> Issue Type: Improvement
> Components: gc
> Reporter: Eric Newton
> Assignee: Eric Newton
> Priority: Trivial
> Fix For: 1.5.0
>
>
> The METADATA table is often small, with many in-memory writes. Because it is
> small, it does not normally get flushed, which will prune data with the
> versioning/delete iterators. Over time, the many in-memory versions can
> cause poor performance.
> The file garbage collector (gc) will make lots of updates as it runs. That
> would be a perfect time to flush the table and prune the versions.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira