[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

nragon Fri, 19 May 2017 10:04:47 -0700

Github user nragon commented on the issue:

    https://github.com/apache/flink/pull/2332
  
    I've made a custom solution which works for my use cases. Notice that the 
code attached is not working because it's only a skeleton.
    This prototype uses asynchbase and tries to manage throttling issues as 
mentioned above. The way I do this is by limiting requests per client by 1000 
(also configurable, if you want, depending on hbase capacity and response), and 
skipping records after reaching that threshold. Every record skipped is updated 
according with system timestamp, always keeping the most recent skipped record 
for later updates.
    Now, in my use case I always use a keyby -> reduce before sink, which keeps 
the aggregation state, meaning that every record invoked by hbase sink will 
have the last aggregated value from your previous operators. When all requests 
are done `pending == 0` I compare the last skipped record with the last 
requested record, if the skipped timestamp is less than the requested timestamp 
means that hbase has the last aggregation.
    There is plenty of room for improvments, i just did'nt have the time.
    
    [HBaseSink.txt](https://github.com/apache/flink/files/1014991/HBaseSink.txt)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink

Reply via email to