[ 
https://issues.apache.org/jira/browse/NIFI-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-5781:
---------------------------------
    Description: 
The current schema does not allow null values for fields such as "details", 
"remoteIdentifier", "alternateIdentifier" and others. This Jira is to make the 
schema more flexible and allow for null fields.

This will cause error looking like:
{noformat}
2018-11-01 14:59:30,551 ERROR [Timer-Driven Process Thread-2] 
o.a.n.r.SiteToSiteProvenanceReportingTask 
SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728] 
Error running task 
SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728] due 
to org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: null of string in field details of 
nifi.provenanceEvent{noformat}
+*Workaround*+: specify a writer with a custom schema instead of inheriting 
record schema.
{noformat}
{
  "namespace": "nifi",
  "name": "provenanceEvent",
  "type": "record",
  "fields": [
    { "name": "eventId", "type": "string" },
    { "name": "eventOrdinal", "type": "long" },
    { "name": "eventType", "type": "string" },
    { "name": "timestampMillis", "type": "long" },
    { "name": "durationMillis", "type": "long" },
    { "name": "lineageStart", "type": { "type": "long", "logicalType": 
"timestamp-millis" } },
    { "name": "details", "type": ["null", "string"] },
    { "name": "componentId", "type": ["null", "string"] },
    { "name": "componentType", "type": ["null", "string"] },
    { "name": "componentName", "type": ["null", "string"] },
    { "name": "processGroupId", "type": ["null", "string"] },
    { "name": "processGroupName", "type": ["null", "string"] },
    { "name": "entityId", "type": ["null", "string"] },
    { "name": "entityType", "type": ["null", "string"] },
    { "name": "entitySize", "type": ["null", "long"] },
    { "name": "previousEntitySize", "type": ["null", "long"] },
    { "name": "updatedAttributes", "type": { "type": "map", "values": "string" 
} },
    { "name": "previousAttributes", "type": { "type": "map", "values": "string" 
} },
    { "name": "actorHostname", "type": ["null", "string"] },
    { "name": "contentURI", "type": ["null", "string"] },
    { "name": "previousContentURI", "type": ["null", "string"] },
    { "name": "parentIds", "type": { "type": "array", "items": "string" } },
    { "name": "childIds", "type": { "type": "array", "items": "string" } },
    { "name": "platform", "type": "string" },
    { "name": "application", "type": "string" },
    { "name": "remoteIdentifier", "type": ["null", "string"] },
    { "name": "alternateIdentifier", "type": ["null", "string"] },
    { "name": "transitUri", "type": ["null", "string"] }
  ]
}{noformat}

  was:
The current schema does not allow null values for "details", "remoteIdentifier" 
and "alternateIdentifier" fields.

This will cause error looking like:
{noformat}
2018-11-01 14:59:30,551 ERROR [Timer-Driven Process Thread-2] 
o.a.n.r.SiteToSiteProvenanceReportingTask 
SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728] 
Error running task 
SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728] due 
to org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: null of string in field details of 
nifi.provenanceEvent{noformat}
+*Workaround*+: specify a writer with a custom schema instead of inheriting 
record schema.
{noformat}
{
  "namespace": "nifi",
  "name": "provenanceEvent",
  "type": "record",
  "fields": [
    { "name": "eventId", "type": "string" },
    { "name": "eventOrdinal", "type": "long" },
    { "name": "eventType", "type": "string" },
    { "name": "timestampMillis", "type": "long" },
    { "name": "durationMillis", "type": "long" },
    { "name": "lineageStart", "type": { "type": "long", "logicalType": 
"timestamp-millis" } },
    { "name": "details", "type": ["null", "string"] },
    { "name": "componentId", "type": "string" },
    { "name": "componentType", "type": "string" },
    { "name": "componentName", "type": "string" },
    { "name": "processGroupId", "type": "string" },
    { "name": "processGroupName", "type": "string" },
    { "name": "entityId", "type": "string" },
    { "name": "entityType", "type": "string" },
    { "name": "entitySize", "type": ["null", "long"] },
    { "name": "previousEntitySize", "type": ["null", "long"] },
    { "name": "updatedAttributes", "type": { "type": "map", "values": "string" 
} },
    { "name": "previousAttributes", "type": { "type": "map", "values": "string" 
} },
    { "name": "actorHostname", "type": "string" },
    { "name": "contentURI", "type": "string" },
    { "name": "previousContentURI", "type": "string" },
    { "name": "parentIds", "type": { "type": "array", "items": "string" } },
    { "name": "childIds", "type": { "type": "array", "items": "string" } },
    { "name": "platform", "type": "string" },
    { "name": "application", "type": "string" },
    { "name": "remoteIdentifier", "type": ["null", "string"] },
    { "name": "alternateIdentifier", "type": ["null", "string"] },
    { "name": "transitUri", "type": ["null", "string"] }
  ]
}{noformat}


> Incorrect schema for provenance events in SiteToSiteProvenanceReportingTask
> ---------------------------------------------------------------------------
>
>                 Key: NIFI-5781
>                 URL: https://issues.apache.org/jira/browse/NIFI-5781
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.7.0, 1.8.0, 1.7.1
>            Reporter: Pierre Villard
>            Assignee: Pierre Villard
>            Priority: Major
>
> The current schema does not allow null values for fields such as "details", 
> "remoteIdentifier", "alternateIdentifier" and others. This Jira is to make 
> the schema more flexible and allow for null fields.
> This will cause error looking like:
> {noformat}
> 2018-11-01 14:59:30,551 ERROR [Timer-Driven Process Thread-2] 
> o.a.n.r.SiteToSiteProvenanceReportingTask 
> SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728] 
> Error running task 
> SiteToSiteProvenanceReportingTask[id=0751c46f-0163-1000-7d33-f276e8654728] 
> due to org.apache.avro.file.DataFileWriter$AppendWriteException: 
> java.lang.NullPointerException: null of string in field details of 
> nifi.provenanceEvent{noformat}
> +*Workaround*+: specify a writer with a custom schema instead of inheriting 
> record schema.
> {noformat}
> {
>   "namespace": "nifi",
>   "name": "provenanceEvent",
>   "type": "record",
>   "fields": [
>     { "name": "eventId", "type": "string" },
>     { "name": "eventOrdinal", "type": "long" },
>     { "name": "eventType", "type": "string" },
>     { "name": "timestampMillis", "type": "long" },
>     { "name": "durationMillis", "type": "long" },
>     { "name": "lineageStart", "type": { "type": "long", "logicalType": 
> "timestamp-millis" } },
>     { "name": "details", "type": ["null", "string"] },
>     { "name": "componentId", "type": ["null", "string"] },
>     { "name": "componentType", "type": ["null", "string"] },
>     { "name": "componentName", "type": ["null", "string"] },
>     { "name": "processGroupId", "type": ["null", "string"] },
>     { "name": "processGroupName", "type": ["null", "string"] },
>     { "name": "entityId", "type": ["null", "string"] },
>     { "name": "entityType", "type": ["null", "string"] },
>     { "name": "entitySize", "type": ["null", "long"] },
>     { "name": "previousEntitySize", "type": ["null", "long"] },
>     { "name": "updatedAttributes", "type": { "type": "map", "values": 
> "string" } },
>     { "name": "previousAttributes", "type": { "type": "map", "values": 
> "string" } },
>     { "name": "actorHostname", "type": ["null", "string"] },
>     { "name": "contentURI", "type": ["null", "string"] },
>     { "name": "previousContentURI", "type": ["null", "string"] },
>     { "name": "parentIds", "type": { "type": "array", "items": "string" } },
>     { "name": "childIds", "type": { "type": "array", "items": "string" } },
>     { "name": "platform", "type": "string" },
>     { "name": "application", "type": "string" },
>     { "name": "remoteIdentifier", "type": ["null", "string"] },
>     { "name": "alternateIdentifier", "type": ["null", "string"] },
>     { "name": "transitUri", "type": ["null", "string"] }
>   ]
> }{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to