Weijun-H commented on code in PR #9097:
URL: https://github.com/apache/arrow-rs/pull/9097#discussion_r2666279553


##########
arrow-json/src/reader/tape.rs:
##########
@@ -237,8 +238,21 @@ enum DecoderState {
     ///
     /// Consists of `(literal, decoded length)`
     Literal(Literal, u8),
+    /// Skipping a value (for unprojected fields)
+    ///
+    /// Consists of:
+    /// - `depth`: Nesting level of objects/arrays being skipped (u32)
+    /// - `flags`: Bit-packed flags (in_string: bit 0, escape: bit 1)
+    SkipValue {
+        depth: u32,
+        flags: u8,
+    },
 }
 
+// Bit flags for SkipValue state
+const SKIP_IN_STRING: u8 = 1 << 0; // 0x01
+const SKIP_ESCAPE: u8 = 1 << 1; // 0x02

Review Comment:
   In order to remove the regression, I added a `projection` in `ReaderBuilder` 
to enable projection-aware parsing.
   
   When enabled, JSON fields not present in the schema are skipped during tape 
parsing rather than being fully parsed and later ignored. This improves 
performance for narrow projections over wide JSON data.
   
   <img width="967" height="366" alt="图片" 
src="https://github.com/user-attachments/assets/6b8a74d8-ac09-4161-bb89-8c00bfe22fc1";
 />
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to