Endre Major created FLUME-3297:
----------------------------------

             Summary: HDFS sink name collision
                 Key: FLUME-3297
                 URL: https://issues.apache.org/jira/browse/FLUME-3297
             Project: Flume
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Endre Major


The algorithm that the BucketWriter uses to generate file names can lead to 
concurrency problems. It uses the current time to initialize the file name 
creation algorithm.
{code}
 fileExtensionCounter = new AtomicLong(clock.currentTimeMillis());
{code}
If on a cluster a lot of flume agents started at the same time with the same 
HDFS Sink config, there is a chance that they want to write the same file. This 
is an even bigger problem, when there are multiple HDFS sink on the same node, 
e.g.: a load balancer sink setup.

Workaround: Using a unique "hdfs.filePrefix" solves this problem. There are 
variables that make this easy across different nodes: 
...hdfs.filePrefix=FlumeData1%[localhost]
...hdfs.filePrefix=FlumeData2%[localhost]
On a single node it is easy to make "hdfs.filePrefix" unique.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to