Github user bbende commented on the issue:

    https://github.com/apache/nifi/pull/2478
  
    @bdesert Thanks for the updates, was reviewing the code again and I think 
we need to change to way the `ScanHBaseResultHandler` works...
    
    Currently it adds rows to a list in memory until bulk size is reached, and 
since bulk size defaults to 0, the default case will be that bulk size is never 
reached and all the rows are left as "hanging" rows. This means if someone 
scans a table with 1 million rows, all 1 millions will be in memory before 
being written to the flow file which would not be good for memory usage.
    
    We should be able to write row by row to the flow file and never add them 
to a list. Inside the handler we can use `session.append(flowFile, (out) ->` to 
append a row at a time to the flow file. I think we can then do away with the 
"hanging rows" concept because there won't be anything buffered in memory.


---

Reply via email to