ZhengHanyang created FLUME-3334:
-----------------------------------
Summary: TaildirSource tailFiles Map causing OOM when huge amount
of files
Key: FLUME-3334
URL: https://issues.apache.org/jira/browse/FLUME-3334
Project: Flume
Issue Type: Bug
Components: Sinks+Sources
Affects Versions: 1.9.0, 1.8.0, 1.7.0
Reporter: ZhengHanyang
Attachments: 20190511173448.png, 20190511173521.png
I am using taildir source to monitor a log dir, about 100 new files per
seconds, I set -xmx 2048m for flume, after 2 hours running, I get OOM error
with "Failed writing positionFile".
With a deap dive to heap dump file, i can see tailFiles occupies 1.7G memory,
so I looked into the source code find that flume remember every file that match
the file pattern in tailFiles, so can you add a property to filter file last
modify time, default can be infinity, for example 30min, if the file modify
time is 30min ago then remove it from tailFiles and do not monitor it.
My logs come from real time transcation system and one file per transaction,
file name is trace number, usually a transcation should be completed in several
seconds, so most of the time there is no more update on the file, for some
exception flume just read whole file and we can deal with it specially too.
Please consider this scenario, thanks
!20190511173448.png!
!20190511173521.png!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]