Dio Jin created FLUME-3377:
------------------------------
Summary: file channel data file corrupted, file channel can't be
started
Key: FLUME-3377
URL: https://issues.apache.org/jira/browse/FLUME-3377
Project: Flume
Issue Type: Bug
Components: File Channel
Affects Versions: 1.9.0
Environment: Run in Kubernetes cluster, with 4 replicas, each of which
has its own separate persistent volume for file channel, and there were almost
95% free disk space when the issue occured.
Reporter: Dio Jin
Attachments: flume.conf, flume_exception.log
Hi, we used flume 1.9.0 to ingest data from Kafka to HDFS, our config file is
attached. It ran smoothly for some time, however, it currently failed to ingest
data and kept throwing error logs, some important log is attached. Per log,
the file channel failed to be started due to corrupted data file, and it tried
relentlessly but always failed. The flume instance is hosted in Kubernetes and
has 4 replicas, each of which has its own separate persistent volume for file
channel, and there was almost 95% free disk space when the issue occured.
So there are two questions,
# what is the cause for the corrupted data files? since it is our production
apps, and we trust flume's robustness, so we don't expect to see this corrupted
data file. Moreover, how could we avoid such corrupted data files?
# How do we resume from this situation without losing any data in channel?
Removing checkoutDir and dataDir isn't acceptable.
Thanks very much.
Here are some very key logs, full logs can be seen in attached file.
org.apache.flume.channel.file.FileChannel.start(FileChannel.java:295)] Failed
to start the file channel [channel=channel2HDFS1]
2020-07-29T07:15:31.640949847Z java.lang.RuntimeException:
org.apache.flume.channel.file.CorruptEventException: Could not parse event from
data file.
2020-07-29T07:15:31.638860323Z at
org.apache.flume.channel.file.TransactionEventRecord.fromByteArray(TransactionEventRecord.java:212)
...
2020-07-29T07:15:31.64750767Z 2020-07-29 00:15:31,646
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:158)] Unable to
deliver event. Exception follows.
2020-07-29T07:15:31.647539686Z java.lang.IllegalStateException: Channel closed
[channel=channel2HDFS1]. Due to java.lang.RuntimeException:
org.apache.flume.channel.file.CorruptEventException: Could not parse event from
data file.
2020-07-29T07:15:31.647552984Z at
org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:358)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]