You may want to compare the input data with what is delivered. See if it is
a case of missing lines or truncated lines.
if it is a case of truncated lines .. then set the
deserializer.maxLineLength

For spool dir on windows... you may need this patch FLUME-2508

-roshan

On Wed, Dec 3, 2014 at 6:47 PM, XuGary <[email protected]> wrote:

> Hi Flume developers     We are trying to use flume in one project. Our
> current testing scenario is , in windows side, there are 3000 files(each
> about 4-5MB), we use flume to send them to remote Hadoop env. Totally
> source files bytes about 14GB,  and we get 80 files in hadoop with bz2
> format.  After unzip we find in hadoop side totally we get about 13GB(so 1
> GB missed).no error in flume log.     We change the code a bit, use
> FileDeserializer.java , it is very similar with LineDeserializer.java;but
> comment/*if (c == '\n')  break;*/and erase the max limit in this code. We
> think one file will be one event
> Do you have any ideas about it? Thanks in advance.
> ENV: flume in window64, 1.5.0, remote apache hadoop 2.2 in linux
>
> CONF:#agent1agent1.sources=source1agent1.sinks=sink1agent1.channels=channel1
>
> #source1agent1.sources.source1.type=spooldiragent1.sources.source1.spoolDir=dataagent1.sources.source1.channels=channel1agent1.sources.source1.fileHeader=falseagent1.sources.source1.batchSize=3000agent1.sources.source1.deserializer=FILE
> #sink1agent1.sinks.sink1.type=hdfsagent1.sinks.sink1.hdfs.path=hdfs://
> c0045305.itcs.hp.com:8120/user/QA/%y-%m-%d/%H%Magent1.sinks.sink1.hdfs.fileType=CompressedStreamagent1.sinks.sink1.hdfs.codeC=bzip2agent1.sinks.sink1.hdfs.writeFormat=TEXTagent1.sinks.sink1.hdfs.rollInterval=0agent1.sinks.sink1.hdfs.idleTimeout=120agent1.sinks.sink1.hdfs.rollSize=0agent1.sinks.sink1.hdfs.maxOpenFiles=10000agent1.sinks.sink1.hdfs.rollCount=0agent1.sinks.sink1.hdfs.batchSize=10000agent1.sinks.sink1.hdfs.callTimeout=60000agent1.sinks.sink1.hdfs.useLocalTimeStamp=trueagent1.sinks.sink1.hdfs.minBlockReplicas=1agent1.sinks.sink1.channel=channel1
>
> #channel1agent1.channels.channel1.type=fileagent1.channels.maxFileSize=3146435071agent1.channels.channel1.checkpointDir=data_tmp123agent1.channels.channel1.dataDirs=dataChannelsagent1.channels.channel1.transactionCapacity=20000
> Regards,Gary Xu

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to