[ 
https://issues.apache.org/jira/browse/STORM-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696476#comment-14696476
 ] 

ASF GitHub Bot commented on STORM-837:
--------------------------------------

Github user arunmahadevan commented on a diff in the pull request:

    https://github.com/apache/storm/pull/644#discussion_r37049715
  
    --- Diff: 
external/storm-hdfs/src/main/java/org/apache/storm/hdfs/trident/HdfsState.java 
---
    @@ -136,44 +174,98 @@ public void run() {
             private transient FSDataOutputStream out;
             protected RecordFormat format;
             private long offset = 0;
    +        private int bufferSize =  131072; // default 128 K
     
    -        public HdfsFileOptions withFsUrl(String fsUrl){
    +        public HdfsFileOptions withFsUrl(String fsUrl) {
                 this.fsUrl = fsUrl;
                 return this;
             }
     
    -        public HdfsFileOptions withConfigKey(String configKey){
    +        public HdfsFileOptions withConfigKey(String configKey) {
                 this.configKey = configKey;
                 return this;
             }
     
    -        public HdfsFileOptions withFileNameFormat(FileNameFormat 
fileNameFormat){
    +        public HdfsFileOptions withFileNameFormat(FileNameFormat 
fileNameFormat) {
                 this.fileNameFormat = fileNameFormat;
                 return this;
             }
     
    -        public HdfsFileOptions withRecordFormat(RecordFormat format){
    +        public HdfsFileOptions withRecordFormat(RecordFormat format) {
                 this.format = format;
                 return this;
             }
     
    -        public HdfsFileOptions withRotationPolicy(FileRotationPolicy 
rotationPolicy){
    +        public HdfsFileOptions withRotationPolicy(FileRotationPolicy 
rotationPolicy) {
                 this.rotationPolicy = rotationPolicy;
                 return this;
             }
     
    -        public HdfsFileOptions addRotationAction(RotationAction action){
    +        /**
    +         * <p>Set the size of the buffer used for hdfs file copy in case 
of recovery. The default
    +         * value is 131072.</p>
    +         *
    +         * <p> Note: The lower limit for the parameter is 4096, below 
which the
    +         * option is ignored. </p>
    +         *
    +         * @param sizeInBytes the buffer size in bytes
    +         * @return {@link HdfsFileOptions}
    +         */
    +        public HdfsFileOptions withBufferSize(int sizeInBytes) {
    +            this.bufferSize = Math.max(4096, sizeInBytes); // at least 4K
    +            return this;
    +        }
    +
    +        @Deprecated
    --- End diff --
    
    As mentioned in the previous comments and also in the jira (STORM-837), 
rotation actions are too generic and make the recovery difficult if the process 
crash in the middle of the action. Hence its best to avoid rotation actions in 
case exactly once semantics is expected. Will add a note in the README.


> HdfsState ignores commits
> -------------------------
>
>                 Key: STORM-837
>                 URL: https://issues.apache.org/jira/browse/STORM-837
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Robert Joseph Evans
>            Assignee: Arun Mahadevan
>            Priority: Critical
>
> HdfsState works with trident which is supposed to provide exactly once 
> processing.  It does this two ways, first by informing the state about 
> commits so it can be sure the data is written out, and second by having a 
> commit id, so that double commits can be handled.
> HdfsState ignores the beginCommit and commit calls, and with that ignores the 
> ids.  This means that if you use HdfsState and your worker crashes you may 
> both lose data and get some data twice.
> At a minimum the flush and file rotation should be tied to the commit in some 
> way.  The commit ID should at a minimum be written out with the data so 
> someone reading the data can have a hope of deduping it themselves.
> Also with the rotationActions it is possible for a file that was partially 
> written is leaked, and never moved to the final location, because it is not 
> rotated.  I personally think the actions are too generic for this case and 
> need to be deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to