[ 
https://issues.apache.org/jira/browse/NIFI-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337448#comment-16337448
 ] 

ASF GitHub Bot commented on NIFI-3538:
--------------------------------------

Github user MikeThomsen commented on the issue:

    https://github.com/apache/nifi/pull/2294
  
    @mgaido91 I added the charset. I can't believe I missed that...
    
    WRT the batching, I stand by my opinion that we need to have a sane default 
there because there needs to be a way to ensure someone doesn't accidentally 
(or on purpose) send an operation that is too big to HBase at once.
    
    To me this is not a theoretical issue because I ran into something like 
this with PutHBaseRecord doing genomic data ingestion w/ NiFi. The data set 
would generate easily 10B, if not 20-25B tiny (like few dozen byte) writes. I 
had to really scale back the size of each record set I was sending to 
PutHBaseRecord because it was easily to generate so many Puts that it would 
hammer a region offline unexpectedly.
    
    I'm not a HBase expert by any means, but it seems like a recipe for trouble 
based on my experience with putting a lot of small writes (and Delete objects 
are tiny writes).


> Add DeleteHBase processor(s)
> ----------------------------
>
>                 Key: NIFI-3538
>                 URL: https://issues.apache.org/jira/browse/NIFI-3538
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Mike Thomsen
>            Priority: Major
>
> NiFi currently has processors for storing and retrieving cells/rows in HBase, 
> but there is no mechanism for deleting records and/or tables.
> I'm not sure if a single DeleteHBase processor could accomplish both, that 
> can be discussed under this Jira (and can be split out if necessary).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to