[ 
https://issues.apache.org/jira/browse/CAMEL-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798431#comment-13798431
 ] 

Ben O'Day commented on CAMEL-6867:
----------------------------------

the issue with using the messageId is that the connectOnStartup mode creates 
the initial file stream on startup (no messageId to use in this case).  how 
about if we use the UUID generator from the CamelContext like this: 
getEndpoint().getCamelContext().getUuidGenerator().generateUuid()?

also, any reason to continue to prepend the DEFAULT_SEGMENT_PREFIX with this 
new approach...the prefix "seg" seems pretty arbitrary and should probably be 
configurable if we need to keep it...


> camel-hdfs - HdfsProducer filename collisions when Producer instance recreated
> ------------------------------------------------------------------------------
>
>                 Key: CAMEL-6867
>                 URL: https://issues.apache.org/jira/browse/CAMEL-6867
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-hdfs
>            Reporter: Ben O'Day
>            Assignee: Ben O'Day
>             Fix For: Future
>
>
> The HdfsProducer uses an instance variable (long splitNum) that is 
> incremented to create unique output filenames in a given directory (seg0, 
> seg1, etc).  
> If the Producer instance is recreated (producer cache limit exceeded, server 
> restart, etc), the splitNum variable is reset to 0.  This results in files 
> being overwritten when using overwrite=true mode or throwing "The file 
> already exists" errors when using overwrite=false mode.
> We should switch to using a timestamp or some other unique generator to 
> prevent filename collisions regardless of the Producer instance lifecycle for 
> the same hdfs directory URL...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to