Hello
I am generating files continuously in local folder of my base machine. How I
can now use the flume to stream the generated files from local folder to HDFS.
I have written some configuration but its giving some issues ,please give me
some sample code configuration
This is my configratin file
agents.sources=spooldir-source
agents.sinks=hdfs-sink
agents.channels=ch1
agents.sources.spooldir-source.type=spooldir
agents.sources.spooldir-source.spoolDir=/apache-tomcat-7.0.39/logs/MultiThreadLogs
agents.sources.spooldir-source.fileSuffix=.SPOOL
agents.sources.spooldir-source.fileHeader=true
agents.sources.spooldir-source.bufferMaxLineLength=50000
agents.sinks.hdfs-sink.type=hdfs
agents.sinks.hdfs-sink.hdfs.path=hdfs://cloudx-740-677:54300/multipleFiles/
agents.sinks.hdfs-sink.hdfs.rollSize=12553700
agents.sinks.hdfs-sink.hdfs.rollCount=12553665
agents.sinks.hdfs-sink.hdfs.rollInterval=3000
agents.sinks.hdfs-sink.hdfs.fileType=DataStream
agents.sinks.hdfs-sink.hdfs.writeFormat=Text
agents.channels.ch1.type=file
agents.sources.spooldir-source.channels=ch1
agents.sinks.hdfs-sink.channel=ch1
If I adding a large files (10Mb) , getting error
13/04/18 16:11:21 ERROR source.SpoolDirectorySource: Uncaught exception in
Runnable
java.lang.IllegalStateException: File has been modified since being read:
/apache-tomcat-7.0.39/logs/MultiThreadLogs/log_0.txt
at
org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile(SpoolingFileLineReader.java:237)
at
org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:185)
at
org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
13/04/18 16:11:21 ERROR source.SpoolDirectorySource: Uncaught exception in
Runnable
java.io.IOException: Stream closed
at java.io.BufferedReader.ensureOpen(BufferedReader.java:115)
at java.io.BufferedReader.readLine(BufferedReader.java:310)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at
org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:180)
at
org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
if increase the "bufferMaxLineLength"
java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedReader.<init>(BufferedReader.java:98)
at
org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
at
org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
at
org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Thanks&Regards
Venkat.D
-----Original Message-----
From: Venkatesh Sivasubramanian (JIRA) [mailto:[email protected]]
Sent: Tuesday, April 23, 2013 8:57 AM
To: [email protected]
Subject: [jira] [Comment Edited] (FLUME-1819) ExecSource don't flush the cache
if there is no input entries
[
https://issues.apache.org/jira/browse/FLUME-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638744#comment-13638744
]
Venkatesh Sivasubramanian edited comment on FLUME-1819 at 4/23/13 3:27 AM:
---------------------------------------------------------------------------
Yes Hari, let me take a stab. Will keep you posted. thanks!
was (Author: venkyz):
Yes Hari, let me take a stab. Will keep you posted.
> ExecSource don't flush the cache if there is no input entries
> -------------------------------------------------------------
>
> Key: FLUME-1819
> URL: https://issues.apache.org/jira/browse/FLUME-1819
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.3.0
> Reporter: Fengdong Yu
> Assignee: Venkatesh Sivasubramanian
> Fix For: v1.4.0
>
> Attachments: FLUME-1819.patch, FLUME-1819.patch.1
>
>
> ExecSource has a default batchSize: 20, exec source read data from the
> source, then put it into the cache, after the cache is full, push it to the
> channel.
> but if exec source's cache is not full, and there isn't any input for a long
> time, then these entries always kept in the cache, there is no chance to the
> channel until the source's cache is full.
> so, the patch added a new config line: batchTimeout for ExecSource, and
> default is 3 seconds, if batchTimeout exceeded, push all cached data to the
> channel even the cache is not full.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
The contents of this e-mail and any attachment(s) may contain confidential or
privileged information for the intended recipient(s). Unintended recipients are
prohibited from taking action on the basis of information in this e-mail and
using or disseminating the information, and must notify the sender and delete
it from their system. L&T Infotech will not accept responsibility or liability
for the accuracy or completeness of, or the presence of any virus or disabling
code in this e-mail"