[
https://issues.apache.org/jira/browse/FLUME-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182717#comment-14182717
]
Pal Konyves commented on FLUME-2517:
------------------------------------
I didn't look into the code in that much detail that uses the BucketPath class.
Because one HDFSSink can have multiple writer threads, I assumed, the
problematic method can be called from different threads. So it might not make
sense to make it thread-safe in this situation.
On the other hand, if someone in the future wants to call
BucketPath#replaceShorthand from multiple threads, it might malfunction. So I
think we better make it thread-safe already.
Here they suggest the same solutions for using SimpleDateFormat:
http://stackoverflow.com/questions/10411944/java-text-simpledateformat-not-thread-safe
> Performance issue: SimpleDateFormat constructor takes 30% of
> HDFSEventSink.process()
> ------------------------------------------------------------------------------------
>
> Key: FLUME-2517
> URL: https://issues.apache.org/jira/browse/FLUME-2517
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.5.0.1
> Environment: linux i686
> java version "1.7.0_45"
> Reporter: Pal Konyves
> Labels: performance
> Attachments: flume_2517.patch, flume_2517.png
>
>
> I started investigating why HDFS sink has so bad throughput in v 1.5.0.0. It
> seems to be better in 1.6.0.0 (current trunk).
> PseudoTx channel was filling up, because HDFS Sink could not write as fast as
> data coming from source.
> Profiling from jconsole revealed that 30% of the time spent in
> HDFSEventSink.process() method is taken by constructing SimpleDateFormat
> objects. SimpleDateFormat object is notoriously a heavy and time consuming
> object to create. It is also not thread-safe.
> It is used in HDFS Sink to calculate the path that contains date-time
> wildcards. I will provide a patch to cache SimpleDateFormat objects for
> thread. With this patch, the PseudoTx channel I used for testing was not
> constantly filling up, and throughput was much better.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)