[
https://issues.apache.org/jira/browse/NIFI-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pierre Villard updated NIFI-5781:
---------------------------------
Description:
The current schema does not allow null values for fields such as "details",
"remoteIdentifier", "alternateIdentifier" and others. This Jira is to make the
schema more flexible and allow for null fields.
This will cause error looking like:
{noformat}
2018-11-01 14:59:30,551 ERROR [Timer-Driven Process Thread-2]
o.a.n.r.SiteToSiteProvenanceReportingTask
SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728]
Error running task
SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728] due
to org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.NullPointerException: null of string in field details of
nifi.provenanceEvent{noformat}
+*Workaround*+: specify a writer with a custom schema instead of inheriting
record schema.
{noformat}
{
"namespace": "nifi",
"name": "provenanceEvent",
"type": "record",
"fields": [
{ "name": "eventId", "type": "string" },
{ "name": "eventOrdinal", "type": "long" },
{ "name": "eventType", "type": "string" },
{ "name": "timestampMillis", "type": "long" },
{ "name": "durationMillis", "type": "long" },
{ "name": "lineageStart", "type": { "type": "long", "logicalType":
"timestamp-millis" } },
{ "name": "details", "type": ["null", "string"] },
{ "name": "componentId", "type": ["null", "string"] },
{ "name": "componentType", "type": ["null", "string"] },
{ "name": "componentName", "type": ["null", "string"] },
{ "name": "processGroupId", "type": ["null", "string"] },
{ "name": "processGroupName", "type": ["null", "string"] },
{ "name": "entityId", "type": ["null", "string"] },
{ "name": "entityType", "type": ["null", "string"] },
{ "name": "entitySize", "type": ["null", "long"] },
{ "name": "previousEntitySize", "type": ["null", "long"] },
{ "name": "updatedAttributes", "type": { "type": "map", "values": "string"
} },
{ "name": "previousAttributes", "type": { "type": "map", "values": "string"
} },
{ "name": "actorHostname", "type": ["null", "string"] },
{ "name": "contentURI", "type": ["null", "string"] },
{ "name": "previousContentURI", "type": ["null", "string"] },
{ "name": "parentIds", "type": { "type": "array", "items": "string" } },
{ "name": "childIds", "type": { "type": "array", "items": "string" } },
{ "name": "platform", "type": "string" },
{ "name": "application", "type": "string" },
{ "name": "remoteIdentifier", "type": ["null", "string"] },
{ "name": "alternateIdentifier", "type": ["null", "string"] },
{ "name": "transitUri", "type": ["null", "string"] }
]
}{noformat}
was:
The current schema does not allow null values for "details", "remoteIdentifier"
and "alternateIdentifier" fields.
This will cause error looking like:
{noformat}
2018-11-01 14:59:30,551 ERROR [Timer-Driven Process Thread-2]
o.a.n.r.SiteToSiteProvenanceReportingTask
SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728]
Error running task
SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728] due
to org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.NullPointerException: null of string in field details of
nifi.provenanceEvent{noformat}
+*Workaround*+: specify a writer with a custom schema instead of inheriting
record schema.
{noformat}
{
"namespace": "nifi",
"name": "provenanceEvent",
"type": "record",
"fields": [
{ "name": "eventId", "type": "string" },
{ "name": "eventOrdinal", "type": "long" },
{ "name": "eventType", "type": "string" },
{ "name": "timestampMillis", "type": "long" },
{ "name": "durationMillis", "type": "long" },
{ "name": "lineageStart", "type": { "type": "long", "logicalType":
"timestamp-millis" } },
{ "name": "details", "type": ["null", "string"] },
{ "name": "componentId", "type": "string" },
{ "name": "componentType", "type": "string" },
{ "name": "componentName", "type": "string" },
{ "name": "processGroupId", "type": "string" },
{ "name": "processGroupName", "type": "string" },
{ "name": "entityId", "type": "string" },
{ "name": "entityType", "type": "string" },
{ "name": "entitySize", "type": ["null", "long"] },
{ "name": "previousEntitySize", "type": ["null", "long"] },
{ "name": "updatedAttributes", "type": { "type": "map", "values": "string"
} },
{ "name": "previousAttributes", "type": { "type": "map", "values": "string"
} },
{ "name": "actorHostname", "type": "string" },
{ "name": "contentURI", "type": "string" },
{ "name": "previousContentURI", "type": "string" },
{ "name": "parentIds", "type": { "type": "array", "items": "string" } },
{ "name": "childIds", "type": { "type": "array", "items": "string" } },
{ "name": "platform", "type": "string" },
{ "name": "application", "type": "string" },
{ "name": "remoteIdentifier", "type": ["null", "string"] },
{ "name": "alternateIdentifier", "type": ["null", "string"] },
{ "name": "transitUri", "type": ["null", "string"] }
]
}{noformat}
> Incorrect schema for provenance events in SiteToSiteProvenanceReportingTask
> ---------------------------------------------------------------------------
>
> Key: NIFI-5781
> URL: https://issues.apache.org/jira/browse/NIFI-5781
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.7.0, 1.8.0, 1.7.1
> Reporter: Pierre Villard
> Assignee: Pierre Villard
> Priority: Major
>
> The current schema does not allow null values for fields such as "details",
> "remoteIdentifier", "alternateIdentifier" and others. This Jira is to make
> the schema more flexible and allow for null fields.
> This will cause error looking like:
> {noformat}
> 2018-11-01 14:59:30,551 ERROR [Timer-Driven Process Thread-2]
> o.a.n.r.SiteToSiteProvenanceReportingTask
> SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728]
> Error running task
> SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728]
> due to org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.lang.NullPointerException: null of string in field details of
> nifi.provenanceEvent{noformat}
> +*Workaround*+: specify a writer with a custom schema instead of inheriting
> record schema.
> {noformat}
> {
> "namespace": "nifi",
> "name": "provenanceEvent",
> "type": "record",
> "fields": [
> { "name": "eventId", "type": "string" },
> { "name": "eventOrdinal", "type": "long" },
> { "name": "eventType", "type": "string" },
> { "name": "timestampMillis", "type": "long" },
> { "name": "durationMillis", "type": "long" },
> { "name": "lineageStart", "type": { "type": "long", "logicalType":
> "timestamp-millis" } },
> { "name": "details", "type": ["null", "string"] },
> { "name": "componentId", "type": ["null", "string"] },
> { "name": "componentType", "type": ["null", "string"] },
> { "name": "componentName", "type": ["null", "string"] },
> { "name": "processGroupId", "type": ["null", "string"] },
> { "name": "processGroupName", "type": ["null", "string"] },
> { "name": "entityId", "type": ["null", "string"] },
> { "name": "entityType", "type": ["null", "string"] },
> { "name": "entitySize", "type": ["null", "long"] },
> { "name": "previousEntitySize", "type": ["null", "long"] },
> { "name": "updatedAttributes", "type": { "type": "map", "values":
> "string" } },
> { "name": "previousAttributes", "type": { "type": "map", "values":
> "string" } },
> { "name": "actorHostname", "type": ["null", "string"] },
> { "name": "contentURI", "type": ["null", "string"] },
> { "name": "previousContentURI", "type": ["null", "string"] },
> { "name": "parentIds", "type": { "type": "array", "items": "string" } },
> { "name": "childIds", "type": { "type": "array", "items": "string" } },
> { "name": "platform", "type": "string" },
> { "name": "application", "type": "string" },
> { "name": "remoteIdentifier", "type": ["null", "string"] },
> { "name": "alternateIdentifier", "type": ["null", "string"] },
> { "name": "transitUri", "type": ["null", "string"] }
> ]
> }{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)