[ 
https://issues.apache.org/jira/browse/FLUME-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Shreedharan updated FLUME-1767:
------------------------------------

    Description: It looks like the HDFS sink's process method calls the append 
method, which in turn calls the callWithTimeout method which then waits till 
the HDFS operation is completed before starting the next one. We could 
parallelize this to improve performance a lot. Since all the methods in 
BucketWriter are synchronized only one bucket would still be updated at any 
point in time, so writing multiple events at the same time would essentially 
affect only different buckets.  (was: It looks like the HDFS sink's process 
method calls the append method, which in turn calls the callWithTimeout method 
which then waits till the HDFS operation is completed before starting the next 
one. We could parallelize this to improve performance a lot. )
    
> HDFS sink performance should parallelize HDFS operations
> --------------------------------------------------------
>
>                 Key: FLUME-1767
>                 URL: https://issues.apache.org/jira/browse/FLUME-1767
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>
> It looks like the HDFS sink's process method calls the append method, which 
> in turn calls the callWithTimeout method which then waits till the HDFS 
> operation is completed before starting the next one. We could parallelize 
> this to improve performance a lot. Since all the methods in BucketWriter are 
> synchronized only one bucket would still be updated at any point in time, so 
> writing multiple events at the same time would essentially affect only 
> different buckets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to