Hi All,

I am using a fanout sink to distribute data to two different sources -
1. a local file system, rolled every minute
2. HDFS ,rolled every hour

I am also using the regexAll decorator to catch 3 attributes :
provider,datatype and apitype.I arrange the data in specific directories in
the HDFS based on these attributes.Having these attributes is critical to
me - as there are hive tables set to look into these directories.

Almost 2-3 times every day,the flume collector goes into an error state
with the following message :

2012-04-06 00:24:29,903 [logicalNode collector174-4142] INFO
connector.DirectDriver: Connector logicalNode collector174-4142 exited with
error: Event already had an event with attribute provider
java.lang.IllegalArgumentException: Event already had an event with
attribute provider
    at com.cloudera.flume.core.EventBaseImpl.set(EventBaseImpl.java:62)
    at com.cloudera.flume.core.Attributes.setString(Attributes.java:112)
    at
com.cloudera.flume.core.extractors.RegexAllExtractor.append(RegexAllExtractor.java:95)
    at
com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:150)
2012-04-06 00:24:29,905 [logicalNode collector174-4142] INFO
collector.CollectorSource: closed
2012-04-06 00:24:30,918 [logicalNode collector174-4142] INFO
thrift.ThriftEventSource: Closed server on port 35853...
2012-04-06 00:24:30,919 [logicalNode collector174-4142] INFO
thrift.ThriftEventSource: Queue still has 720 elements ...

Googling for this error,I came across the following JIRA:

https://issues.cloudera.org/browse/FLUME-265

However, the new syntax assumes that the two files will be rolled out at
the same interval,while having two different roll times is central to my
architecture.

I am using flume flume-0.9.4-cdh3u3.


Regards,
Abhishek

Reply via email to