[ 
https://issues.apache.org/jira/browse/NIFI-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431788#comment-16431788
 ] 

ASF GitHub Bot commented on NIFI-5064:
--------------------------------------

GitHub user junegunn opened a pull request:

    https://github.com/apache/nifi/pull/2621

    NIFI-5064 Fixes and improvements to PutKudu processor

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
        - *No, I intentionally made separate commits so that it's easier to 
review each change.*
    
    ### For code changes:
    - [x] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
    - [x] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/junegunn/nifi put-kudu-improvements

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/2621.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2621
    
----
commit f5b9ed17e47fb463d84f15c873dcaab84fdf685f
Author: Junegunn Choi <junegunn.c@...>
Date:   2018-04-03T07:36:54Z

    Bump kudu-client version to 1.7.0

commit e4b9abcc98b3fa8f55b522f7302b80841f3df123
Author: Junegunn Choi <junegunn.c@...>
Date:   2018-04-03T07:40:24Z

    Deprecate "Skip head line" property of PutKudu

commit 8672a4ddba0c146dfd05fee263bb9f2df8cfc1b7
Author: Junegunn Choi <junegunn.c@...>
Date:   2018-04-03T07:38:07Z

    Fix IllegalArgumentException on 16-bit integer columns

commit 9e8305710ec7111ecd8c7cc214abeeefa51f6219
Author: Junegunn Choi <junegunn.c@...>
Date:   2018-04-06T04:45:46Z

    Fix IllegalArgumentException on 8-bit integer columns

commit 71c2e9fc6b142ee0b7177948cf7497077e6596c4
Author: Junegunn Choi <junegunn.c@...>
Date:   2018-04-03T07:39:13Z

    Fix NullPointerException on null/missing values

commit ab8f7047a79ddd60a9c061bb58c192bda813ea46
Author: Junegunn Choi <junegunn.c@...>
Date:   2018-04-06T04:48:39Z

    Add support for DECIMAL types

commit 77ed66946e7bf627e15f17aa488a49b39918f0ee
Author: Junegunn Choi <junegunn.c@...>
Date:   2018-04-10T04:48:59Z

    Fix PutKudu to properly handle server-side errors

----


> Fixes and improvements to PutKudu processor
> -------------------------------------------
>
>                 Key: NIFI-5064
>                 URL: https://issues.apache.org/jira/browse/NIFI-5064
>             Project: Apache NiFi
>          Issue Type: Improvement
>    Affects Versions: 1.6.0
>            Reporter: Junegunn Choi
>            Priority: Major
>
> 1. Currently, PutKudu fails with NPE on null or missing values.
> 2. {{IllegalArgumentException}} on 16-bit integer columns because of [a 
> missing {{break}} in case clause for INT16 
> columns|https://github.com/apache/nifi/blob/rel/nifi-1.6.0/nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/PutKudu.java#L112-L115].
> 3. Also, {{IllegalArgumentException}} on 8-bit integer columns. We need a 
> separate case clause for INT8 columns where {{PartialRow#addByte}} instead of 
> {{PartialRow#addShort}} is be used.
> 4. NIFI-4384 added batch size parameter, however, it only applies to 
> FlowFiles with multiple records. {{KuduSession}} is created and closed for 
> each FlowFile, so if a FlowFile contains only a single record, no batching 
> takes place. A workaround would be to use a preprocessor to concatenate 
> multiple FlowFiles, but since {{PutHBase}} and {{PutSQL}} use 
> {{session.get(batchSize)}} to handle multiple FlowFiles at once, I think we 
> can take the same approach here with PutKudu as it simplifies the data flow.
> 5. {{PutKudu}} depends on kudu-client 1.3.0. But we can safely update to 
> 1.7.0.
>  - [https://github.com/apache/kudu/blob/1.7.0/docs/release_notes.adoc]
>  - [https://github.com/apache/kudu/blob/1.7.0/docs/prior_release_notes.adoc]
> A notable change in Kudu 1.7.0 is the addition of Decimal type.
> 6. {{PutKudu}} has {{Skip head line}} property for ignoring the first record 
> in a FlowFile. I suppose this was added to handle header lines in CSV files, 
> but I really don't think it's something {{PutKudu}} should handle. 
> {{CSVReader}} already has {{Treat First Line as Header}} option, so we should 
> tell the users to use it instead as we don't want to have the same option 
> here and there. Also, the default value of {{Skip head line}} is {{true}}, 
> and I found it very confusing as my use case was to stream-process 
> single-record FlowFiles. We can keep this property for backward 
> compatibility, but we should at least deprecate it and change the default 
> value to {{false}}.
> 7. Server-side errors such as uniqueness constraint violation are not checked 
> and simply ignored. When flush mode is set to {{AUTO_FLUSH_SYNC}}, we should 
> check the return value of {{KuduSession#apply}} to see it has {{RowError}}, 
> but PutKudu currently ignores it. For example, on uniqueness constraint 
> violation, we get a {{RowError}} saying "_Already present: key already 
> present (error 0)_".
> On the other hand, when flush mode is set to {{AUTO_FLUSH_BACKGROUND}}, 
> {{KuduSession#apply}}, understandably, returns null, and we should check the 
> return value of {{KuduSession#getPendingErrors()}}. And when the mode is 
> {{MANUAL_FLUSH}}, we should examine the return value of 
> {{KuduSession#flush()}} or {{KuduSession#close()}}. In this case, we also 
> have to make sure that we don't overflow the mutation buffer of 
> {{KuduSession}} by calling {{flush()}} before too late.
> ----
> I'll create a pull request on GitHub. Since there are multiple issues to be 
> addressed. I made separate commits for each issue mentioned above so that 
> it's easier to review. You might want to squash them into one, or cherry-pick 
> a subset of commits if you don't agree with some decisions I made.
> Please let me know what you think. We deployed the code to a production 
> server last week and it's been running since without any issues steadily 
> processing 20K records/second.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to