ChrisSamo632 commented on PR #6903:
URL: https://github.com/apache/nifi/pull/6903#issuecomment-1514554132

   @davis-anthony this PR if/when Approved (@MikeThomsen / @mattyb149 ) was not 
going to introduce such behaviour, *but* it *does* make sense for 
`PutElasticsearchJson` so I've added a new `elasticsearch.bulk.error` attribute 
for files being sent to the `errors` relationship - this will contain the 
`_bulk` API's response for the document if Elasticsearch has marked it as 
`error`ed (and if it's `not_found` if you set the `Treat "Not Found" as 
Success` to `false`)
   
   It's not as simple for `PutElasticsearchRecord` because the `errors` output 
FlowFile may contain multiple records (each being a document sent to 
Elasticsearch). So we'd either be serialising all errors into a single 
attribute that could be huge (and probably break attribute value limits) or 
adding an attribute for every single record, which would cause memory issues in 
NiFi. An alternative would be to produce a single `errors` FlowFile for every 
errored Record from the input FlowFile, which again would cause performance 
problems in NiFi if you're trying to process large amounts of Records (which is 
the big benefit of Record-based processors, e.g. millions of records within a 
single file).
   
   A flow I've used before for handling things like errors from a record 
processor such as `PutElasticsearchRecord` is to send the `errors` to ` 
PutDistributedMapCache` keyed on the document `_id` (which is in the error 
response from Elasticsearch) and then using the `FetchDistributedMapCache` or 
`LookupAttribute` to enrich each of the records in the `PutElasticsearchRecord` 
output in the cases where there's an `_id` match - it's a bit fidly and could 
require splitting FlowFiles by Record (which again brings us back to the 
performance hit mentioned above).
   
   *Notes*:
   - `PutElasticsearchHttp` (and `PutElasticsearchHttpRecord`) are _deprecated_ 
is recent 1.x versions of NiFi and will be *removed* in NiFi 2.x
   - the `elasticsearch.put.error` attribute for both `PutElasticsearchJson` 
and `PutElasticsearchRecord` are used for general Elasticsearch connection 
error reporting, e.g. if the Elasticsearch instance/cluster cis not found or 
authentication/authorisation fails, etc., and FlowFile processing, e.g. if the 
content of the FlowFile sent to `PutElastichsearchJson` can't be parsed as a 
JSON object


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to