[ 
https://issues.apache.org/jira/browse/NIFI-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296525#comment-16296525
 ] 

ASF GitHub Bot commented on NIFI-4696:
--------------------------------------

Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2342#discussion_r157701335
  
    --- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/PutHiveStreaming.java
 ---
    @@ -156,7 +158,9 @@
                 .displayName("Hive Configuration Resources")
                 .description("A file or comma separated list of files which 
contains the Hive configuration (hive-site.xml, e.g.). Without this, Hadoop "
                         + "will search the classpath for a 'hive-site.xml' 
file or will revert to a default configuration. Note that to enable 
authentication "
    -                    + "with Kerberos e.g., the appropriate properties must 
be set in the configuration files. Please see the Hive documentation for more 
details.")
    +                    + "with Kerberos e.g., the appropriate properties must 
be set in the configuration files. Also note that if Max Concurrent Tasks is 
set "
    +                    + "to a number greater than one, the 
'hcatalog.hive.client.cache.disabled' property will be forced to 'true' to 
avoid concurrency issues. "
    --- End diff --
    
    I know you reference the Hive documentation but shouldn't we explicitly say 
that it's not possible to concurrently write in the same database.table?


> Support concurrent tasks in PutHiveStreaming
> --------------------------------------------
>
>                 Key: NIFI-4696
>                 URL: https://issues.apache.org/jira/browse/NIFI-4696
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>
> Currently PutHiveStreaming (PHS) can only support a single task at a time. 
> Before NIFI-4342, that meant each target table would need its own PHS 
> instance, which can be cumbersome with large numbers of tables. After 
> NIFI-4342, Expression Language could be used for SDLC purposes 
> (database/table changes between development and production, e.g.).
> However it would be nice to be able to support at least database/table names 
> using flow file attributes, and also to support multiple tasks to handle them 
> concurrently. Due to the nature of PHS and the Streaming Ingest APIs (and 
> implementation), it is likely not prudent to allow two tasks to write to the 
> same table and partition at the same time.
> I propose adding flow file attribute EL evaluation where prudent, and 
> allowing per-table concurrency in PHS. A thread will attempt to get a lock on 
> a table, and if it cannot, will rollback and return.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to