Hi All,
I am using a fanout sink to distribute data to two different sources -
1. a local file system, rolled every minute
2. HDFS ,rolled every hour
I am also using the regexAll decorator to catch 3 attributes :
provider,datatype and apitype.I arrange the data in specific directories in
the HDFS based on these attributes.Having these attributes is critical to
me - as there are hive tables set to look into these directories.
Almost 2-3 times every day,the flume collector goes into an error state
with the following message :
2012-04-06 00:24:29,903 [logicalNode collector174-4142] INFO
connector.DirectDriver: Connector logicalNode collector174-4142 exited with
error: Event already had an event with attribute provider
java.lang.IllegalArgumentException: Event already had an event with
attribute provider
at com.cloudera.flume.core.EventBaseImpl.set(EventBaseImpl.java:62)
at com.cloudera.flume.core.Attributes.setString(Attributes.java:112)
at
com.cloudera.flume.core.extractors.RegexAllExtractor.append(RegexAllExtractor.java:95)
at
com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:150)
2012-04-06 00:24:29,905 [logicalNode collector174-4142] INFO
collector.CollectorSource: closed
2012-04-06 00:24:30,918 [logicalNode collector174-4142] INFO
thrift.ThriftEventSource: Closed server on port 35853...
2012-04-06 00:24:30,919 [logicalNode collector174-4142] INFO
thrift.ThriftEventSource: Queue still has 720 elements ...
Googling for this error,I came across the following JIRA:
https://issues.cloudera.org/browse/FLUME-265
However, the new syntax assumes that the two files will be rolled out at
the same interval,while having two different roll times is central to my
architecture.
I am using flume flume-0.9.4-cdh3u3.
Regards,
Abhishek