[ 
https://issues.apache.org/jira/browse/STORM-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697331#comment-14697331
 ] 

ASF GitHub Bot commented on STORM-837:
--------------------------------------

Github user d2r commented on a diff in the pull request:

    https://github.com/apache/storm/pull/644#discussion_r37093803
  
    --- Diff: 
external/storm-hdfs/src/main/java/org/apache/storm/hdfs/trident/HdfsState.java 
---
    @@ -136,44 +174,98 @@ public void run() {
             private transient FSDataOutputStream out;
             protected RecordFormat format;
             private long offset = 0;
    +        private int bufferSize =  131072; // default 128 K
     
    -        public HdfsFileOptions withFsUrl(String fsUrl){
    +        public HdfsFileOptions withFsUrl(String fsUrl) {
                 this.fsUrl = fsUrl;
                 return this;
             }
     
    -        public HdfsFileOptions withConfigKey(String configKey){
    +        public HdfsFileOptions withConfigKey(String configKey) {
                 this.configKey = configKey;
                 return this;
             }
     
    -        public HdfsFileOptions withFileNameFormat(FileNameFormat 
fileNameFormat){
    +        public HdfsFileOptions withFileNameFormat(FileNameFormat 
fileNameFormat) {
                 this.fileNameFormat = fileNameFormat;
                 return this;
             }
     
    -        public HdfsFileOptions withRecordFormat(RecordFormat format){
    +        public HdfsFileOptions withRecordFormat(RecordFormat format) {
                 this.format = format;
                 return this;
             }
     
    -        public HdfsFileOptions withRotationPolicy(FileRotationPolicy 
rotationPolicy){
    +        public HdfsFileOptions withRotationPolicy(FileRotationPolicy 
rotationPolicy) {
                 this.rotationPolicy = rotationPolicy;
                 return this;
             }
     
    -        public HdfsFileOptions addRotationAction(RotationAction action){
    +        /**
    +         * <p>Set the size of the buffer used for hdfs file copy in case 
of recovery. The default
    +         * value is 131072.</p>
    +         *
    +         * <p> Note: The lower limit for the parameter is 4096, below 
which the
    +         * option is ignored. </p>
    +         *
    +         * @param sizeInBytes the buffer size in bytes
    +         * @return {@link HdfsFileOptions}
    +         */
    +        public HdfsFileOptions withBufferSize(int sizeInBytes) {
    +            this.bufferSize = Math.max(4096, sizeInBytes); // at least 4K
    +            return this;
    +        }
    +
    +        @Deprecated
    --- End diff --
    
    Ah, I hadn't noticed that part of the JIRA description.  That's a good 
answer; just wanted to make sure the change was intentional.


> HdfsState ignores commits
> -------------------------
>
>                 Key: STORM-837
>                 URL: https://issues.apache.org/jira/browse/STORM-837
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Robert Joseph Evans
>            Assignee: Arun Mahadevan
>            Priority: Critical
>
> HdfsState works with trident which is supposed to provide exactly once 
> processing.  It does this two ways, first by informing the state about 
> commits so it can be sure the data is written out, and second by having a 
> commit id, so that double commits can be handled.
> HdfsState ignores the beginCommit and commit calls, and with that ignores the 
> ids.  This means that if you use HdfsState and your worker crashes you may 
> both lose data and get some data twice.
> At a minimum the flush and file rotation should be tied to the commit in some 
> way.  The commit ID should at a minimum be written out with the data so 
> someone reading the data can have a hope of deduping it themselves.
> Also with the rotationActions it is possible for a file that was partially 
> written is leaked, and never moved to the final location, because it is not 
> rotated.  I personally think the actions are too generic for this case and 
> need to be deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to