Guthman opened a new issue, #47173:
URL: https://github.com/apache/arrow/issues/47173

   ### Describe the enhancement requested
   
   Arrow currently supports JSON reading in C++, Python, Java, etc. But it 
currently lacks any equivalent of a JSON writer. While Rust 
(arrow_json::writer) and Go (arrjson) implement their own serialization, they 
do not leverage the shared C++ core.
   
   This results in these limitations:
   
   - Python users (e.g., BigQuery→Arrow→Postgres JSONB, which is my particular 
use case) must fall back to slow, Python-level loops or fallback to orjson, 
missing C++‑level performance.
   - No feature parity with Rust and Go, which already provide fast JSON 
serialization.
   - Large-scale pipelines suffer from marshalling overhead and poor scaling.
   
   ## Proposal Overview
   
   ### 1. **C++ Core: Add JSON Writer API**
   - Mirror the existing `arrow::json::TableReader` with a new 
`arrow::json::TableWriter` or `RecordBatchWriter`.
   - Support both output formats:
     - **NDJSON** (newline-delimited)
     - **JSON array**
   - Configurable via builder-pattern options:
     - Include or omit nulls
     - Binary types encoding (e.g., Base64)
     - Formatting (pretty, flat)
   
   ### 2. **Bindings**
   - **PyArrow**: add `pyarrow.json.write_json(table_or_batch, sink=None, 
ndjson=False, include_nulls=True, binary_encoding="base64")`, wrapping the new 
C++ API.
   - **Arrow Java**: introduce a corresponding `JsonWriter` class to maintain 
cross-language feature consistency.
   
   ### 3. **Functionality & Performance**
   - Full support for Arrow types: scalars, nested structs/lists, binary, 
timestamps, dictionaries, nulls.
   - Streaming output row-by-row to avoid in-memory buffering.
   - Benchmark target: achieve near-native performance, comparable to Rust’s 
`LineDelimitedWriter`.
   
   
   *I had this request edited by an LLM, as I'm not very familiar with the 
Arrow backend architecture. I checked all the claims, but some inaccuracies 
might has slipped through, if so, sorry.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to