[ 
https://issues.apache.org/jira/browse/METRON-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159172#comment-16159172
 ] 

ASF GitHub Bot commented on METRON-1153:
----------------------------------------

GitHub user justinleet opened a pull request:

    https://github.com/apache/metron/pull/741

    METRON-1153 HDFS HdfsWriter never recovers from exceptions

    ## Contributor Comments
    Added a try-catch around the actual write that will rotate the file and try 
again if there's a stream closed underneath it.  Added two unit tests, one that 
ensures things flow through nicely to a single file with a double write and one 
that ensures that things flow to two files if the channel is closed underneath 
for whatever reason (done by just calling `closeOutputFile()` when outside of 
the normal flow).
    
    It's not a perfect solution, but it should alleviate any transient pain and 
let us know if the problem is deeper if it keeps showing up.
    
    Also added a missing set to a field from the constructor which probably 
wasn't helping things since I happened to notice it.  It's a one line change, 
so it seemed excessive to create a separate PR.
    
    ## Pull Request Checklist
    
    Thank you for submitting a contribution to Apache Metron.  
    Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
    Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  
    
    
    In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
    - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
    - [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    
    ### For code changes:
    - [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
    - [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
    - [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
      ```
      mvn -q clean integration-test install && build_utils/verify_licenses.sh 
      ```
    
    - [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
    - [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?
    
    ### For documentation related changes:
    - [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:
    
      ```
      cd site-book
      mvn site
      ```
    
    #### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
    It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/justinleet/metron METRON-1153

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/741.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #741
    
----
commit e5a2f3c6114f26b2091640a57bf4ca02e74addfa
Author: justinjleet <[email protected]>
Date:   2017-09-08T19:34:08Z

    adding attempt to get new file on channel closed exception

----


> HDFS HdfsWriter never recovers from exceptions
> ----------------------------------------------
>
>                 Key: METRON-1153
>                 URL: https://issues.apache.org/jira/browse/METRON-1153
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Otto Fowler
>
> {code:java}
> o.a.m.w.BulkWriterComponent [ERROR] Failing 51 tuples
> java.io.IOException: Stream closed
>         at 
> org.apache.hadoop.crypto.CryptoOutputStream.checkStream(CryptoOutputStream.java:250)
>  ~[stormjar.jar:?]
>         at 
> org.apache.hadoop.crypto.CryptoOutputStream.write(CryptoOutputStream.java:133)
>  ~[stormjar.jar:?]
>         at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
>  ~[stormjar.jar:?]
>         at java.io.DataOutputStream.write(DataOutputStream.java:107) 
> ~[?:1.8.0_131]
>         at java.io.FilterOutputStream.write(FilterOutputStream.java:97) 
> ~[?:1.8.0_131]
>         at 
> org.apache.metron.writer.hdfs.SourceHandler.handle(SourceHandler.java:74) 
> ~[stormjar.jar:?]
>         at 
> org.apache.metron.writer.hdfs.HdfsWriter.write(HdfsWriter.java:113) 
> ~[stormjar.jar:?]
>         at 
> org.apache.metron.writer.BulkWriterComponent.flush(BulkWriterComponent.java:239)
>  [stormjar.jar:?]
>         at 
> org.apache.metron.writer.BulkWriterComponent.flushTimeouts(BulkWriterComponent.java:281)
>  [stormjar.jar:?]
>         at 
> org.apache.metron.writer.bolt.BulkMessageWriterBolt.execute(BulkMessageWriterBolt.java:211)
>  [stormjar.jar:?]
>         at 
> org.apache.storm.daemon.executor$fn__6573$tuple_action_fn__6575.invoke(executor.clj:734)
>  [storm-core-1.0.1.2.5.6.0-40.jar:1.0.1.2.5.6.0-40]
>         at 
> org.apache.storm.daemon.executor$mk_task_receiver$fn__6494.invoke(executor.clj:469)
>  [storm-core-1.0.1.2.5.6.0-40.jar:1.0.1.2.5.6.0-40]
>         at 
> org.apache.storm.disruptor$clojure_handler$reify__6007.onEvent(disruptor.clj:40)
>  [storm-core-1.0.1.2.5.6.0-40.jar:1.0.1.2.5.6.0-40]
>         at 
> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:451)
>  [storm-core-1.0.1.2.5.6.0-40.jar:1.0.1.2.5.6.0-40]
>         at 
> org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:430)
>  [storm-core-1.0.1.2.5.6.0-40.jar:1.0.1.2.5.6.0-40]
>         at 
> org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
>  [storm-core-1.0.1.2.5.6.0-40.jar:1.0.1.2.5.6.0-40]
>         at 
> org.apache.storm.daemon.executor$fn__6573$fn__6586$fn__6639.invoke(executor.clj:853)
>  [storm-core-1.0.1.2.5.6.0-40.jar:1.0.1.2.5.6.0-40]
>         at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) 
> [storm-core-1.0.1.2.5.6.0-40.jar:1.0.1.2.5.6.0-40]
>         at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> {code}
> The SourceHandler does not verify that the output stream it works with is 
> open before writing.  As a long running process, it should not assume that 
> the stream is always valid.
> This is hard however, because there is no great way to verify that the stream 
> is OK.
> Instead, the HdfsWriter would remove the source handler if there is an 
> IOException, but then the issue is how we do not couple tuples to messages, 
> which means that there will need to be refactoring from the bolt on down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to