[ 
https://issues.apache.org/jira/browse/FLUME-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829109#comment-13829109
 ] 

Viktor Trako edited comment on FLUME-2241 at 11/21/13 5:22 PM:
---------------------------------------------------------------

I can see the same problem will occur with all characters made up of 2 bytes

||unicode||character||UTF-8(hex)||
|U+0080|         |      c2 80|
|U+0081|         |      c2 81|
|U+008|          |      c2 82|
|U+0083|         |      c2 83|
|U+0084|         |      c2 84|
|U+0085|         |      c2 85|
|U+0086|         |      c2 86|
|U+0087|         |      c2 87|
|U+0088|         |      c2 88|
|U+0089|         |      c2 89|
|U+008A|         |      c2 8a|
|U+008B|         |      c2 8b|
|U+008C|         |      c2 8c|
|U+008D|         |      c2 8d|
|U+008E|         |      c2 8e|
|U+008F|         |      c2 8f|
|U+0090|         |      c2 90|
|U+0091|         |      c2 91|
|U+0092|         |      c2 92|
|U+0093|         |      c2 93|
|U+0094|         |      c2 94|
|U+0095|         |      c2 95|
|U+0096|         |      c2 96|
|U+0097|         |      c2 97|
|U+0098|         |      c2 98|
|U+0099|         |      c2 99|
|U+009A|         |      c2 9a|
|U+009B|         |      c2 9b|
|U+009C|         |      c2 9c|
|U+009D|         |      c2 9d|
|U+009E|         |      c2 9e|
|U+009F|         |      c2 9f|
|U+00A0|         |      c2 a0|
|U+00A1|        ¡|      c2 a1|
|U+00A2|        ¢|      c2 a2|
|U+00A3|        £|      c2 a3|
|U+00A4|        ¤|      c2 a4|
|U+00A5|        ¥|      c2 a5|
|U+00A6|        ¦|      c2 a6|
|U+00A7|        §|      c2 a7|
|U+00A8|        ¨|      c2 a8|
|U+00A9|        ©|      c2 a9|
|U+00AA|        ª|      c2 aa|
|U+00AB|        «|      c2 ab|
|U+00AC|        ¬|      c2 ac|
|U+00AD|        |       c2 ad|
|U+00AE|        ®|      c2 ae|
|U+00AF|        ¯|      c2 af|
|U+00B0|        °|      c2 b0|
|U+00B1|        ±|      c2 b1|
|U+00B2|        ²|      c2 b2|
|U+00B3|        ³|      c2 b3|
|U+00B4|        ´|      c2 b4|
|U+00B5|        µ|      c2 b5|
|U+00B6|        ¶|      c2 b6|
|U+00B7|        ·|      c2 b7|
|U+00B8|        ¸|      c2 b8|
|U+00B9|        ¹|      c2 b9|
|U+00BA|        º|      c2 ba|
|U+00BB|        »|      c2 bb|
|U+00BC|        ¼|      c2 bc|
|U+00BD|        ½|      c2 bd|
|U+00BE|        ¾|      c2 be|
|U+00BF|        ¿|      c2 bf|
|U+00C0|        À|      c3 80|
|U+00C1|        Á|      c3 81|
|U+00C2|        Â|      c3 82|
|U+00C3|        Ã|      c3 83|
|U+00C4|        Ä|      c3 84|
|U+00C5|        Å|      c3 85|
|U+00C6|        Æ|      c3 86|
|U+00C7|        Ç|      c3 87|
|U+00C8|        È|      c3 88|
|U+00C9|        É|      c3 89|
|U+00CA|        Ê|      c3 8a|
|U+00CB|        Ë|      c3 8b|
|U+00CC|        Ì|      c3 8c|
|U+00CD|        Í|      c3 8d|
|U+00CE|        Î|      c3 8e|
|U+00CF|        Ï|      c3 8f|
|U+00D0|        Ð|      c3 90|
|U+00D1|        Ñ|      c3 91|
|U+00D2|        Ò|      c3 92|
|U+00D3|        Ó|      c3 93|
|U+00D4|        Ô|      c3 94|
|U+00D5|        Õ|      c3 95|
|U+00D6|        Ö|      c3 96|
|U+00D7|        ×|      c3 97|
|U+00D8|        Ø|      c3 98|
|U+00D9|        Ù|      c3 99|
|U+00DA|        Ú|      c3 9a|
|U+00DB|        Û|      c3 9b|
|U+00DC|        Ü|      c3 9c|
|U+00DD|        Ý|      c3 9d|
|U+00DE|        Þ|      c3 9e|
|U+00DF|        ß|      c3 9f|
|U+00E0|        à|      c3 a0|
|U+00E1|        á|      c3 a1|
|U+00E2|        â|      c3 a2|
|U+00E3|        ã|      c3 a3|
|U+00E4|        ä|      c3 a4|
|U+00E5|        å|      c3 a5|
|U+00E6|        æ|      c3 a6|
|U+00E7|        ç|      c3 a7|
|U+00E8|        è|      c3 a8|
|U+00E9|        é|      c3 a9|
|U+00EA|        ê|      c3 aa|
|U+00EB|        ë|      c3 ab|
|U+00EC|        ì|      c3 ac|
|U+00ED|        í|      c3 ad|
|U+00EE|        î|      c3 ae|
|U+00EF|        ï|      c3 af|
|U+00F0|        ð|      c3 b0|
|U+00F1|        ñ|      c3 b1|
|U+00F2|        ò|      c3 b2|
|U+00F3|        ó|      c3 b3|
|U+00F4|        ô|      c3 b4|
|U+00F5|        õ|      c3 b5|
|U+00F6|        ö|      c3 b6|
|U+00F7|        ÷|      c3 b7|
|U+00F8|        ø|      c3 b8|
|U+00F9|        ù|      c3 b9|
|U+00FA|        ú|      c3 ba|
|U+00FB|        û|      c3 bb|
|U+00FC|        ü|      c3 bc|
|U+00FD|        ý|      c3 bd|
|U+00FE|        þ|      c3 be|
|U+00FF|        ÿ|      c3 bf |


was (Author: viktort):
I can see the same problem will occur with all characters made up of 2 bytes

||unicode||     ||character||   ||UTF-8(hex)||
|U+0080|         |      c2 80|
|U+0081|         |      c2 81|
|U+008|          |      c2 82|
|U+0083|         |      c2 83|
|U+0084|         |      c2 84|
|U+0085|         |      c2 85|
|U+0086|         |      c2 86|
|U+0087|         |      c2 87|
|U+0088|         |      c2 88|
|U+0089|         |      c2 89|
|U+008A|         |      c2 8a|
|U+008B|         |      c2 8b|
|U+008C|         |      c2 8c|
|U+008D|         |      c2 8d|
|U+008E|         |      c2 8e|
|U+008F|         |      c2 8f|
|U+0090|         |      c2 90|
|U+0091|         |      c2 91|
|U+0092|         |      c2 92|
|U+0093|         |      c2 93|
|U+0094|         |      c2 94|
|U+0095|         |      c2 95|
|U+0096|         |      c2 96|
|U+0097|         |      c2 97|
|U+0098|         |      c2 98|
|U+0099|         |      c2 99|
|U+009A|         |      c2 9a|
|U+009B|         |      c2 9b|
|U+009C|         |      c2 9c|
|U+009D|         |      c2 9d|
|U+009E|         |      c2 9e|
|U+009F|         |      c2 9f|
|U+00A0|         |      c2 a0|
|U+00A1|        ¡|      c2 a1|
|U+00A2|        ¢|      c2 a2|
|U+00A3|        £|      c2 a3|
|U+00A4|        ¤|      c2 a4|
|U+00A5|        ¥|      c2 a5|
|U+00A6|        ¦|      c2 a6|
|U+00A7|        §|      c2 a7|
|U+00A8|        ¨|      c2 a8|
|U+00A9|        ©|      c2 a9|
|U+00AA|        ª|      c2 aa|
|U+00AB|        «|      c2 ab|
|U+00AC|        ¬|      c2 ac|
|U+00AD|        |       c2 ad|
|U+00AE|        ®|      c2 ae|
|U+00AF|        ¯|      c2 af|
|U+00B0|        °|      c2 b0|
|U+00B1|        ±|      c2 b1|
|U+00B2|        ²|      c2 b2|
|U+00B3|        ³|      c2 b3|
|U+00B4|        ´|      c2 b4|
|U+00B5|        µ|      c2 b5|
|U+00B6|        ¶|      c2 b6|
|U+00B7|        ·|      c2 b7|
|U+00B8|        ¸|      c2 b8|
|U+00B9|        ¹|      c2 b9|
|U+00BA|        º|      c2 ba|
|U+00BB|        »|      c2 bb|
|U+00BC|        ¼|      c2 bc|
|U+00BD|        ½|      c2 bd|
|U+00BE|        ¾|      c2 be|
|U+00BF|        ¿|      c2 bf|
|U+00C0|        À|      c3 80|
|U+00C1|        Á|      c3 81|
|U+00C2|        Â|      c3 82|
|U+00C3|        Ã|      c3 83|
|U+00C4|        Ä|      c3 84|
|U+00C5|        Å|      c3 85|
|U+00C6|        Æ|      c3 86|
|U+00C7|        Ç|      c3 87|
|U+00C8|        È|      c3 88|
|U+00C9|        É|      c3 89|
|U+00CA|        Ê|      c3 8a|
|U+00CB|        Ë|      c3 8b|
|U+00CC|        Ì|      c3 8c|
|U+00CD|        Í|      c3 8d|
|U+00CE|        Î|      c3 8e|
|U+00CF|        Ï|      c3 8f|
|U+00D0|        Ð|      c3 90|
|U+00D1|        Ñ|      c3 91|
|U+00D2|        Ò|      c3 92|
|U+00D3|        Ó|      c3 93|
|U+00D4|        Ô|      c3 94|
|U+00D5|        Õ|      c3 95|
|U+00D6|        Ö|      c3 96|
|U+00D7|        ×|      c3 97|
|U+00D8|        Ø|      c3 98|
|U+00D9|        Ù|      c3 99|
|U+00DA|        Ú|      c3 9a|
|U+00DB|        Û|      c3 9b|
|U+00DC|        Ü|      c3 9c|
|U+00DD|        Ý|      c3 9d|
|U+00DE|        Þ|      c3 9e|
|U+00DF|        ß|      c3 9f|
|U+00E0|        à|      c3 a0|
|U+00E1|        á|      c3 a1|
|U+00E2|        â|      c3 a2|
|U+00E3|        ã|      c3 a3|
|U+00E4|        ä|      c3 a4|
|U+00E5|        å|      c3 a5|
|U+00E6|        æ|      c3 a6|
|U+00E7|        ç|      c3 a7|
|U+00E8|        è|      c3 a8|
|U+00E9|        é|      c3 a9|
|U+00EA|        ê|      c3 aa|
|U+00EB|        ë|      c3 ab|
|U+00EC|        ì|      c3 ac|
|U+00ED|        í|      c3 ad|
|U+00EE|        î|      c3 ae|
|U+00EF|        ï|      c3 af|
|U+00F0|        ð|      c3 b0|
|U+00F1|        ñ|      c3 b1|
|U+00F2|        ò|      c3 b2|
|U+00F3|        ó|      c3 b3|
|U+00F4|        ô|      c3 b4|
|U+00F5|        õ|      c3 b5|
|U+00F6|        ö|      c3 b6|
|U+00F7|        ÷|      c3 b7|
|U+00F8|        ø|      c3 b8|
|U+00F9|        ù|      c3 b9|
|U+00FA|        ú|      c3 ba|
|U+00FB|        û|      c3 bb|
|U+00FC|        ü|      c3 bc|
|U+00FD|        ý|      c3 bd|
|U+00FE|        þ|      c3 be|
|U+00FF|        ÿ|      c3 bf |

> Spooling Directory Source doesn't handle files with large-ish event data
> ------------------------------------------------------------------------
>
>                 Key: FLUME-2241
>                 URL: https://issues.apache.org/jira/browse/FLUME-2241
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.4.0
>         Environment: Debian 6.0.5
>            Reporter: Viktor Trako
>
> I have a flume agent set up with a spooling directory source sinking data to 
> cassandra.
> I'm collecting web data writing a line in the log file for each request then 
> once the log file has been rotated is dropped into the spooling directory 
> ready for flume to start processing it. All data is valid json as its 
> validated prior to it being written to the log file.
> Sending a mixture of different sized requests from 9-15k seems fine. 
> Generated a log file of over 400Mb and it all sinked correctly.
> I'm currently logging a 19k request and this is when things start to break. 
> It only gets as far as 1800th request in the file and the next one is 
> truncated.
> Changed the sink to a file-roll sink and it only gets as far as 29Mb
> I have profiled it and it's not running out of memory. I want to know if 
> there are any limitations on the spooling directory source.
> Has anyone tried dropping a file with similarly large requests and 
> experienced a similar issue.
> Any pointers would be greatly appreciated. My flume config is as follows
> {code:title=flume_conf|borderStyle=solid}
> orion.sources = spoolDir
> orion.channels = fileChannel
> orion.sinks= cassandra
> orion.channels.fileChannel.type = file
> orion.channels.fileChannel.capacity = 1000000
> orion.channels.fileChannel.transactionCapacity = 100
> orion.channels.fileChannel.keep-alive = 60
> orion.channels.fileChannel.write-timeout = 60
> orion.sinks.cassandra.type = com.btoddb.flume.sinks.cassandra.CassandraSink
> orion.sinks.cassandra.hosts = <cluster node ip>
> orion.sinks.cassandra.cluster_name = fake_cluster
> orion.sinks.cassandra.port = 9160
> orion.sinks.cassandra.keyspace-name = Keysp
> orion.sinks.cassandra.records-colfam = <table>
> orion.sources.spoolDir.type = spooldir
> orion.sources.spoolDir.spoolDir = /var/log/orion/flumeSpooling
> orion.sources.spoolDir.deserializer = LINE
> orion.sources.spoolDir.inputCharset = UTF-8
> orion.sources.spoolDir.deserializer.maxLineLength = 20000000
> orion.sources.spoolDir.deletePolicy = never
> orion.sources.spoolDir.batchSize = 100
> orion.sources.spoolDir.interceptors = addSrc addHost addTimestamp addUUID
> orion.sources.spoolDir.interceptors.addSrc.type = regex_extractor
> orion.sources.spoolDir.interceptors.addSrc.regex = \"service\"\:\"([^"]*)
> orion.sources.spoolDir.interceptors.addSrc.serializers = s1
> orion.sources.spoolDir.interceptors.addSrc.serializers.s1.name = src
> orion.sources.spoolDir.interceptors.addUUID.type = regex_extractor
> orion.sources.spoolDir.interceptors.addUUID.regex = \"uuid\"\:\"([^"]*)
> orion.sources.spoolDir.interceptors.addUUID.serializers = s1
> orion.sources.spoolDir.interceptors.addUUID.serializers.s1.name = key
> orion.sources.spoolDir.interceptors.addHost.type = 
> org.apache.flume.interceptor.HostInterceptor$Builder
> orion.sources.spoolDir.interceptors.addHost.preserveExisting = false
> orion.sources.spoolDir.interceptors.addHost.useIP = true
> orion.sources.spoolDir.interceptors.addHost.hostHeader = host
> orion.sources.spoolDir.interceptors.addTimestamp.type = regex_extractor
> orion.sources.spoolDir.interceptors.addTimestamp.regex = 
> \"timestamp\"\:\"([^"]*)
> orion.sources.spoolDir.interceptors.addTimestamp.serializers = s1
> orion.sources.spoolDir.interceptors.addTimestamp.serializers.s1.name = 
> timestamp
> orion.sources.spoolDir.channels = fileChannel
> orion.sinks.cassandra.channel = fileChannel
> {code}
> Is this potentially a bug?.. If not tried can someone try to recreate - I 
> hope the same error would occur.
> Dont hesitate to contact me for further info.
> Viktor



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to