Fabio Nascimento Brandão created ARTEMIS-3868:
-------------------------------------------------
Summary: Compact process filling all the disk
Key: ARTEMIS-3868
URL: https://issues.apache.org/jira/browse/ARTEMIS-3868
Project: ActiveMQ Artemis
Issue Type: Bug
Components: ActiveMQ-Artemis-Native
Affects Versions: 2.23.1, 2.23.0, 2.22.0, 2.21.0, 2.20.0, 2.19.0
Reporter: Fabio Nascimento Brandão
Assignee: Clebert Suconic
Attachments: CompactDiskUsage.java, image-2022-06-21-19-33-59-276.png
We are having some problems with disk space usage. Basically we have a lot of
queues and one of them fail to consume the messages (problem in a microservice
for about 10 minutes).
Then we use the message redelivery feature to process these messages again
after some delay.
The problems ocurrs with de journal compact. This process creates a lot of
files until fill the disk.
I took some created files at production server and tried to compact them. The
disk usage grow a lot and only after the second compact process the disk usage
stayed at an acceptable level.
I attached a "test" to reproduce the problema. The test create one journal file
with 2048 records and using only 2,1MB. The records have an information that is
the "compact count". I adjusted this field to 1 and 2.
What I found out is when the compactor process a record with this field less
than 2 (after some record being greater or equals to 2), the JournalCompactor
creates a new file.
In this test, when I start the journal it creates a new file of 10MB (default
configuration) and keeps the 2,1MB file.
When I ran the compact process, It creates 1024 files and keeps the old 2. 1024
* 10MB = 10GB!!!
If I ran the compact process again, It shrink to only 2 files of 10MB!!!
The problem seems to be in the JournalCompactor at method checkCompact. I think
we can remove this method and assume the flow like it returned false:
[https://github.com/apache/activemq-artemis/blob/main/artemis-journal/src/main/java/org/apache/activemq/artemis/core/journal/impl/JournalCompactor.java#L190-L203]
This was an old code that was introduced at HornetQ:
[https://github.com/hornetq/hornetq/commit/93af1cb92f4050e54e83c8daa9c67ce43dbcfead]
I also attached an image with the disk usage of my production server with the
problem.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)