[
https://issues.apache.org/jira/browse/MINIFI-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983386#comment-15983386
]
Kevin Doran edited comment on MINIFI-275 at 4/27/17 1:19 PM:
-------------------------------------------------------------
Found the culprit... YamlConfiguration.cpp (I am working from commit 573c511f)
In multiple locations in YamlConfiguation.cpp, logic exists that assume the
component ID is set when reading the YAML node (ie, it is interpreted as a
required field). For example, line 84.
Most occurrences of this logic are simple to fix: simply replace that bit of
code that is reading the id field to generate a new UUID if the id field is not
present in the config node.
There is one section of this file that will be more complicated to correct,
which is loading Connections from the configuration YAML. Currently, the logic
assumes the connection source id and destination id will be set. I wasn't sure
of the requirements for interpreting configuration for Connections, so I
referred to the Java MiNiFi implementation. I found the relevant logic there in
org.apache.nifi.minifi.commons.schema
(https://github.com/apache/nifi-minifi/tree/master/minifi-commons/minifi-commons-schema/src/main/java/org/apache/nifi/minifi/commons/schema)
In particular, the class ConfigSchemaV1 has some logic that is relevant to this
ticket. Here is a code snippet:
{code:java}
...
List<ConnectionSchema> connectionSchemas = new ArrayList<>(connections.size());
for (ConnectionSchemaV1 connection : connections) {
ConnectionSchema convert = connection.convert();
convert.setId(getUniqueId(ids, convert.getName()));
String sourceName = connection.getSourceName();
if (remoteInputPortIds.contains(sourceName)) {
convert.setSourceId(sourceName);
} else {
if (duplicateProcessorNames.contains(sourceName)) {
problematicDuplicateNames.add(sourceName);
}
String sourceId = processorNameToIdMap.get(sourceName);
if (!StringUtil.isNullOrEmpty(sourceId)) {
convert.setSourceId(sourceId);
}
}
String destinationName = connection.getDestinationName();
if (remoteInputPortIds.contains(destinationName)) {
convert.setDestinationId(destinationName);
} else {
if (duplicateProcessorNames.contains(destinationName)) {
problematicDuplicateNames.add(destinationName);
}
String destinationId = processorNameToIdMap.get(destinationName);
if (!StringUtil.isNullOrEmpty(destinationId)) {
convert.setDestinationId(destinationId);
}
}
connectionSchemas.add(convert);
}
...
{code}
Essentially, it seems the proper way to handle connections in the YAML config
is as follows:
* All processors should be already loaded, with ids generated if they were not
present in the YAML
* From the loaded processors, keep a map of name -> id. Also keep track of
duplicate names, if any.
* When loading connections, if the source/destination id(s) are not present in
the connection specification, then there are two other ways to resolve the
source/destination for a V1 config file. First, the source name or destination
name might contain a port id for a remote process group. Check that, and if
that fails, attempt to lookup the source and destination _by name_ from the
previously built map. If a src/dest name is in the set of duplicate names, then
bail with an error as the connection configuration is ambiguous.
was (Author: kdoran):
Found the culprit... YamlConfiguration.cpp (I am working from commit 573c511f)
In multiple locations in YamlConfiguation.cpp, logic exists that assume the
component ID is set when reading the YAML node (ie, it is interpreted as a
required field). For example, line 84.
Most occurrences of this logic are simple to fix: simply replace that bit of
code that is reading the id field to generate a new UUID if the id field is not
present in the config node.
There is one section of this file that will be more complicated to correct,
which is loading Connections from the configuration YAML. Currently, the logic
assumes the connection source id and destination id will be set. I wasn't sure
of the requirements for interpreting configuration for Connections, so I
referred to the Java MiNiFi implementation. I found the relevant logic there in
org.apache.nifi.minifi.commons.schema
(https://github.com/apache/nifi-minifi/tree/master/minifi-commons/minifi-commons-schema/src/main/java/org/apache/nifi/minifi/commons/schema)
In particular, the class ConfigSchemaV1 has some logic that is relevant to this
ticket. Here is a code snippet:
{code:java}
...
List<ConnectionSchema> connectionSchemas = new ArrayList<>(connections.size());
for (ConnectionSchemaV1 connection : connections) {
ConnectionSchema convert = connection.convert();
convert.setId(getUniqueId(ids, convert.getName()));
String sourceName = connection.getSourceName();
if (remoteInputPortIds.contains(sourceName)) {
convert.setSourceId(sourceName);
} else {
if (duplicateProcessorNames.contains(sourceName)) {
problematicDuplicateNames.add(sourceName);
}
String sourceId = processorNameToIdMap.get(sourceName);
if (!StringUtil.isNullOrEmpty(sourceId)) {
convert.setSourceId(sourceId);
}
}
String destinationName = connection.getDestinationName();
if (remoteInputPortIds.contains(destinationName)) {
convert.setDestinationId(destinationName);
} else {
if (duplicateProcessorNames.contains(destinationName)) {
problematicDuplicateNames.add(destinationName);
}
String destinationId = processorNameToIdMap.get(destinationName);
if (!StringUtil.isNullOrEmpty(destinationId)) {
convert.setDestinationId(destinationId);
}
}
connectionSchemas.add(convert);
}
...
{code}
Essentially, it seems the proper way to handle connections in the YAML config
is as follows:
* All processors should be already loaded, with ids generated if they were not
present in the YAML
* From the loaded processors, keep a map of name -> id. Also keep track of
duplicate names, if any.
* When loading connections, if the source/destination id(s) are not present in
the connection specification, then attempt to lookup the source id and
destination id by name from the previously built map. If a src/dest name is in
the set of duplicate names, then bail with an error as the connection
configuration is ambiguous.
> Configuration without IDs for components causes exceptions
> ----------------------------------------------------------
>
> Key: MINIFI-275
> URL: https://issues.apache.org/jira/browse/MINIFI-275
> Project: Apache NiFi MiNiFi
> Issue Type: Bug
> Components: C++, Processing Configuration
> Reporter: Aldrin Piri
> Assignee: Kevin Doran
> Priority: Blocker
> Fix For: cpp-0.2.0
>
> Attachments: config.yml
>
>
> One of the changes to how components are handled in C++ introduced a defect
> into the original construct over the version 1 schema of the YAML.
> The absence of this ID causes a YAML exception.
> We should provide handling to support configurations how they were created
> originally, possibly providing a default/generated ID where one isn't
> specified, and start laying the foundation for versioned schemas as provided
> in our Java implementation.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)