[
https://issues.apache.org/jira/browse/NIFI-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297054#comment-16297054
]
ASF GitHub Bot commented on NIFI-4696:
--------------------------------------
Github user mattyb149 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2342#discussion_r157808792
--- Diff:
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/PutHiveStreaming.java
---
@@ -156,7 +158,9 @@
.displayName("Hive Configuration Resources")
.description("A file or comma separated list of files which
contains the Hive configuration (hive-site.xml, e.g.). Without this, Hadoop "
+ "will search the classpath for a 'hive-site.xml'
file or will revert to a default configuration. Note that to enable
authentication "
- + "with Kerberos e.g., the appropriate properties must
be set in the configuration files. Please see the Hive documentation for more
details.")
+ + "with Kerberos e.g., the appropriate properties must
be set in the configuration files. Also note that if Max Concurrent Tasks is
set "
+ + "to a number greater than one, the
'hcatalog.hive.client.cache.disabled' property will be forced to 'true' to
avoid concurrency issues. "
--- End diff --
Yes good point, will add that
> Support concurrent tasks in PutHiveStreaming
> --------------------------------------------
>
> Key: NIFI-4696
> URL: https://issues.apache.org/jira/browse/NIFI-4696
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Matt Burgess
> Assignee: Matt Burgess
>
> Currently PutHiveStreaming (PHS) can only support a single task at a time.
> Before NIFI-4342, that meant each target table would need its own PHS
> instance, which can be cumbersome with large numbers of tables. After
> NIFI-4342, Expression Language could be used for SDLC purposes
> (database/table changes between development and production, e.g.).
> However it would be nice to be able to support at least database/table names
> using flow file attributes, and also to support multiple tasks to handle them
> concurrently. Due to the nature of PHS and the Streaming Ingest APIs (and
> implementation), it is likely not prudent to allow two tasks to write to the
> same table and partition at the same time.
> I propose adding flow file attribute EL evaluation where prudent, and
> allowing per-table concurrency in PHS. A thread will attempt to get a lock on
> a table, and if it cannot, will rollback and return.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)