[ 
https://issues.apache.org/jira/browse/NIFI-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233751#comment-15233751
 ] 

Oleg Zhurakousky commented on NIFI-1680:
----------------------------------------

Working on this in parallel with NIFI-1296, so the improvements will go 
together as part of Kafka improvements since it is the only component at the 
moment that is using it.
Also, have renamed it to StreamTokenizer as that is a more appropriate name for 
it.

> Improve StreamScanner performance
> ---------------------------------
>
>                 Key: NIFI-1680
>                 URL: https://issues.apache.org/jira/browse/NIFI-1680
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Oleg Zhurakousky
>            Assignee: Oleg Zhurakousky
>             Fix For: 0.7.0
>
>
> We have several use cases where a content of a single FlowFile is split 
> within a single cycle (i.e., _onTrigger()_). An example is PutKafka.
> Such splitting involves parsing of a content InputStream into chunks 
> represented as byte[]. Currently we are using custom buffer 
> (_org.apache.nifi.stream.io.util.StreamScanner_) to build byte[]. 
> There are several potential areas of improvement here:
> 1. Perform internal buffering instead of using ByteArrayInputStream and copy 
> bytes into byte array(the bytes that are already in the buffer of 
> ByteArrayInputStream)
> 2. The buffer itself is allocated on the heap. We can consider using 
> DirectBuffer (as optional flag)
> 3. Consider buffer pool where new instance of StreamScanner can pool work 
> buffer from such pol instead of allocating new one. 
> The #1 is by far the most important as it shows (in my test environment) 6 
> times performance improvement over the current implementation of 
> StreamScanner and 1.5 times performance improvement over 
> _java.io.BufferedReader_ which only supports implicit new line delimiter and 
> was used only for comparison. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to