[
https://issues.apache.org/jira/browse/FLUME-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827896#comment-13827896
]
Viktor Trako commented on FLUME-2241:
-------------------------------------
I'd like to add that I have ruled out the sink I'm using - the limit seems to
be on data being read from the Spooling Directory Source
> Spooling Directory Source doesn't handle files with large-ish event data
> ------------------------------------------------------------------------
>
> Key: FLUME-2241
> URL: https://issues.apache.org/jira/browse/FLUME-2241
> Project: Flume
> Issue Type: Bug
> Affects Versions: v1.4.0
> Environment: Debian 6.0.5
> Reporter: Viktor Trako
>
> I have a flume agent set up with a spooling directory source sinking data to
> cassandra.
> I'm collecting web data writing a line in the log file for each request then
> once the log file has been rotated is dropped into the spooling directory
> ready for flume to start processing it. All data is valid json as its
> validated prior to it being written to the log file.
> Sending a mixture of different sized requests from 9-15k seems fine.
> Generated a log file of over 400Mb and it all sinked correctly.
> I'm currently logging a 19k request and this is when things start to break.
> It only gets as far as 1800th request in the file and the next one is
> truncated.
> Changed the sink to a file-roll sink and it only gets as far as 29Mb
> I have profiled it and it's not running out of memory. I want to know if
> there are any limitations on the spooling directory source.
> Has anyone tried dropping a file with similarly large requests and
> experienced a similar issue.
> Any pointers would be greatly appreciated. My flume config is as follows
> {code:title=flume_conf|borderStyle=solid}
> orion.sources = spoolDir
> orion.channels = fileChannel
> orion.sinks= cassandra
> orion.channels.fileChannel.type = file
> orion.channels.fileChannel.capacity = 1000000
> orion.channels.fileChannel.transactionCapacity = 100
> orion.channels.fileChannel.keep-alive = 60
> orion.channels.fileChannel.write-timeout = 60
> orion.sinks.cassandra.type = com.btoddb.flume.sinks.cassandra.CassandraSink
> orion.sinks.cassandra.hosts = <cluster node ip>
> orion.sinks.cassandra.cluster_name = fake_cluster
> orion.sinks.cassandra.port = 9160
> orion.sinks.cassandra.keyspace-name = Keysp
> orion.sinks.cassandra.records-colfam = <table>
> orion.sources.spoolDir.type = spooldir
> orion.sources.spoolDir.spoolDir = /var/log/orion/flumeSpooling
> orion.sources.spoolDir.deserializer = LINE
> orion.sources.spoolDir.inputCharset = UTF-8
> orion.sources.spoolDir.deserializer.maxLineLength = 20000000
> orion.sources.spoolDir.deletePolicy = never
> orion.sources.spoolDir.batchSize = 100
> orion.sources.spoolDir.interceptors = addSrc addHost addTimestamp addUUID
> orion.sources.spoolDir.interceptors.addSrc.type = regex_extractor
> orion.sources.spoolDir.interceptors.addSrc.regex = \"service\"\:\"([^"]*)
> orion.sources.spoolDir.interceptors.addSrc.serializers = s1
> orion.sources.spoolDir.interceptors.addSrc.serializers.s1.name = src
> orion.sources.spoolDir.interceptors.addUUID.type = regex_extractor
> orion.sources.spoolDir.interceptors.addUUID.regex = \"uuid\"\:\"([^"]*)
> orion.sources.spoolDir.interceptors.addUUID.serializers = s1
> orion.sources.spoolDir.interceptors.addUUID.serializers.s1.name = key
> orion.sources.spoolDir.interceptors.addHost.type =
> org.apache.flume.interceptor.HostInterceptor$Builder
> orion.sources.spoolDir.interceptors.addHost.preserveExisting = false
> orion.sources.spoolDir.interceptors.addHost.useIP = true
> orion.sources.spoolDir.interceptors.addHost.hostHeader = host
> orion.sources.spoolDir.interceptors.addTimestamp.type = regex_extractor
> orion.sources.spoolDir.interceptors.addTimestamp.regex =
> \"timestamp\"\:\"([^"]*)
> orion.sources.spoolDir.interceptors.addTimestamp.serializers = s1
> orion.sources.spoolDir.interceptors.addTimestamp.serializers.s1.name =
> timestamp
> orion.sources.spoolDir.channels = fileChannel
> orion.sinks.cassandra.channel = fileChannel
> {code}
> Is this potentially a bug?.. If not tried can someone try to recreate - I
> hope the same error would occur.
> Dont hesitate to contact me for further info.
> Viktor
--
This message was sent by Atlassian JIRA
(v6.1#6144)