Endre Major created FLUME-3297:
----------------------------------
Summary: HDFS sink name collision
Key: FLUME-3297
URL: https://issues.apache.org/jira/browse/FLUME-3297
Project: Flume
Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Endre Major
The algorithm that the BucketWriter uses to generate file names can lead to
concurrency problems. It uses the current time to initialize the file name
creation algorithm.
{code}
fileExtensionCounter = new AtomicLong(clock.currentTimeMillis());
{code}
If on a cluster a lot of flume agents started at the same time with the same
HDFS Sink config, there is a chance that they want to write the same file. This
is an even bigger problem, when there are multiple HDFS sink on the same node,
e.g.: a load balancer sink setup.
Workaround: Using a unique "hdfs.filePrefix" solves this problem. There are
variables that make this easy across different nodes:
...hdfs.filePrefix=FlumeData1%[localhost]
...hdfs.filePrefix=FlumeData2%[localhost]
On a single node it is easy to make "hdfs.filePrefix" unique.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]