Github user cammachusa commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2160#discussion_r140365671
  
    --- Diff: 
nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/AbstractKudu.java
 ---
    @@ -94,6 +97,29 @@
                 .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
                 .build();
     
    +    protected static final PropertyDescriptor FLUSH_MODE = new 
PropertyDescriptor.Builder()
    +            .name("Flush Mode")
    +            .description("Set the new flush mode for a kudu session\n" +
    +                    "AUTO_FLUSH_SYNC: the call returns when the operation 
is persisted, else it throws an exception.\n" +
    +                    "AUTO_FLUSH_BACKGROUND: the call returns when the 
operation has been added to the buffer. This call should normally perform only 
fast in-memory" +
    +                    " operations but it may have to wait when the buffer 
is full and there's another buffer being flushed.\n" +
    +                    "MANUAL_FLUSH: the call returns when the operation has 
been added to the buffer, else it throws a KuduException if the buffer is 
full.")
    +            .allowableValues(SessionConfiguration.FlushMode.values())
    +            
.defaultValue(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND.toString())
    +            .required(true)
    +            .build();
    +
    +    protected static final PropertyDescriptor BATCH_SIZE = new 
PropertyDescriptor.Builder()
    +            .name("Batch Size")
    +            .description("Set the number of operations that can be 
buffered, between 2 - 100000. " +
    +                    "Depend on your memory size, and data size per row set 
an appropriate batch size. " +
    +                    "Gradually increase this number to find out your best 
one for best performance")
    +            .defaultValue("100")
    --- End diff --
    
    Like, I made in note in the description. It's depend on their memory size, 
and data row being inserted, and also their cluster size. Setting the buffer 
size too big won't help, and too small won't help either. And at noted, 
developer got to find out this number from his environment. A lot of people hit 
performance peak at 50 with single machine Kudu's cluster. My colleague hit 
performance peak at 3500 with 6 nodes cluster (10 CPU, 64 GB Memory each). I 
randomly pick 100 as I saw it from other Put-xxx processor, but I don't want to 
put 1000 since most developers test it with single machine, and would leave 
this default value.


---

Reply via email to