Adam Turley created NIFI-15681:
----------------------------------
Summary: Enhance PutElasticsearchJson to support NDJSON, JSON
Array, and Single JSON input formats with size-based batching
Key: NIFI-15681
URL: https://issues.apache.org/jira/browse/NIFI-15681
Project: Apache NiFi
Issue Type: Improvement
Components: Extensions
Affects Versions: 2.8.0
Environment: Containerized NiFi 2.8.0 on Rhel 9
Reporter: Adam Turley
The existing PutElasticsearchJson processor is limited to indexing one JSON
document per FlowFile. This creates significant overhead in high-volume ingest
scenarios, requiring upstream flow logic to reshape data before it can be sent
to Elasticsearch. Additionally, ingesting large datasets requires one FlowFile
per document, creating excessive NiFi session overhead and making it
impractical to send pre-aggregated NDJSON or JSON array payloads directly.
This improvement enhances PutElasticsearchJson in-place while remaining fully
backwards compatible with existing flows. No schema, Record Reader, or schema
registry is required — JSON is passed through directly, making it suitable for
dynamic or schema-less documents.
Why not PutElasticsearchRecord?
PutElasticsearchRecord is the right choice when data arrives in a structured,
well-known format (Avro, CSV, Parquet, etc.) and field-level type mapping,
schema enforcement, or schema evolution is needed. However, it introduces
significant overhead that is unnecessary in many JSON ingest pipelines:
* Schema requirement — a Record Reader and schema (via schema registry,
inferred, or embedded) must be defined and maintained. For JSON data with
dynamic fields, deeply nested structures, or schema-less designs, this is a
configuration burden with no benefit.
* Deserialization cost — PutElasticsearchRecord fully deserializes the input
into NiFi's internal Record object model and then re-serializes it to JSON for
the _bulk request. This is a two-way type conversion for data that is already
valid JSON, adding CPU and memory overhead on every document.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)