[
https://issues.apache.org/jira/browse/FLUME-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Johny Rufus updated FLUME-2517:
-------------------------------
Fix Version/s: v1.5.1
> Performance issue: SimpleDateFormat constructor takes 30% of
> HDFSEventSink.process()
> ------------------------------------------------------------------------------------
>
> Key: FLUME-2517
> URL: https://issues.apache.org/jira/browse/FLUME-2517
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.5.0.1
> Environment: linux i686
> java version "1.7.0_45"
> Reporter: Pal Konyves
> Assignee: Pal Konyves
> Labels: performance
> Fix For: v1.5.1
>
> Attachments: flume_2517.patch, flume_2517.png
>
>
> I started investigating why HDFS sink has so bad throughput in v 1.5.0.0. It
> seems to be better in 1.6.0.0 (current trunk).
> PseudoTx channel was filling up, because HDFS Sink could not write as fast as
> data coming from source.
> Profiling from jconsole revealed that 30% of the time spent in
> HDFSEventSink.process() method is taken by constructing SimpleDateFormat
> objects. SimpleDateFormat object is notoriously a heavy and time consuming
> object to create. It is also not thread-safe.
> It is used in HDFS Sink to calculate the path that contains date-time
> wildcards. I will provide a patch to cache SimpleDateFormat objects for
> thread. With this patch, the PseudoTx channel I used for testing was not
> constantly filling up, and throughput was much better.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)