zhuqi-lucas commented on code in PR #19924:
URL: https://github.com/apache/datafusion/pull/19924#discussion_r2721479829
##########
datafusion/datasource-json/src/file_format.rs:
##########
@@ -166,6 +182,49 @@ impl JsonFormat {
self.options.compression = file_compression_type.into();
self
}
+
+ /// Set whether to expect JSON array format instead of line-delimited
format.
+ ///
+ /// When `true`, expects input like: `[{"a": 1}, {"a": 2}]`
+ /// When `false` (default), expects input like:
+ /// ```text
+ /// {"a": 1}
+ /// {"a": 2}
+ /// ```
+ pub fn with_format_array(mut self, format_array: bool) -> Self {
+ self.options.format_array = format_array;
+ self
+ }
+}
+
+/// Infer schema from a JSON array format file.
+///
+/// This function reads JSON data in array format `[{...}, {...}]` and infers
+/// the Arrow schema from the contained objects.
+fn infer_json_schema_from_json_array<R: Read>(
+ reader: &mut R,
+ max_records: usize,
+) -> std::result::Result<Schema, ArrowError> {
+ let mut content = String::new();
+ reader.read_to_string(&mut content).map_err(|e| {
+ ArrowError::JsonError(format!("Failed to read JSON content: {e}"))
+ })?;
+
+ // Parse as JSON array using serde_json
+ let values: Vec<serde_json::Value> = serde_json::from_str(&content)
Review Comment:
Good point @alamb , i redesigned the PR now, i hope it will have better
performance, and i will try to port to arrow-rs as a follow-up if it's valid
changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]