[ 
https://issues.apache.org/jira/browse/CAMEL-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben O'Day updated CAMEL-6867:
-----------------------------

    Description: 
The HdfsProducer uses an instance variable (long splitNum) that is incremented 
to create unique output filenames in a given directory (seg0, seg1, etc).  

If the Producer instance is recreated (producer cache limit exceeded, server 
restart, etc), the splitNum variable is reset to 0.  This results in files 
being overwritten when using overwrite=true mode or throwing "The file already 
exists" errors when using overwrite=false mode.

We should switch to using a timestamp or some other unique generator to prevent 
filename collisions regardless of the Producer instance lifecycle for the same 
hdfs directory URL...



  was:
The HdfsProducer uses an instance variable (long splitNum) that is incremented 
to create unique output filenames in a given directory (seq0, seq1, etc).  

If the Producer instance is recreated (producer cache limit exceeded, server 
restart, etc), the splitNum variable is reset to 0.  This results in files 
being overwritten when using overwrite=true mode or throwing "The file already 
exists" errors when using overwrite=false mode.

We should switch to using a timestamp or some other unique generator to prevent 
filename collisions regardless of the Producer instance lifecycle for the same 
hdfs directory URL...




> camel-hdfs - HdfsProducer filename collisions when Producer instance recreated
> ------------------------------------------------------------------------------
>
>                 Key: CAMEL-6867
>                 URL: https://issues.apache.org/jira/browse/CAMEL-6867
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-hdfs
>            Reporter: Ben O'Day
>            Assignee: Ben O'Day
>             Fix For: Future
>
>
> The HdfsProducer uses an instance variable (long splitNum) that is 
> incremented to create unique output filenames in a given directory (seg0, 
> seg1, etc).  
> If the Producer instance is recreated (producer cache limit exceeded, server 
> restart, etc), the splitNum variable is reset to 0.  This results in files 
> being overwritten when using overwrite=true mode or throwing "The file 
> already exists" errors when using overwrite=false mode.
> We should switch to using a timestamp or some other unique generator to 
> prevent filename collisions regardless of the Producer instance lifecycle for 
> the same hdfs directory URL...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to