friendlymatthew commented on PR #7871: URL: https://github.com/apache/arrow-rs/pull/7871#issuecomment-3040618000
> > > ... at least, confident we caught all the ones in code paths this specific unit test exercises. But it's tricky -- I tried to make a visual sweep of all `XXX_with_shallow_validation` methods and didn't notice anything else. But then I re-ran the unit test "benchmark" and saw that it was still oddly slow. Found the third missed site during a second, more motivated, code inspection pass. > > > > > > Can we make your test into an actual benchmark (aka something like https://github.com/apache/arrow-rs/blob/main/parquet-variant/benches/variant_builder.rs )? > > Hi, I have an ad-hoc benchmark that lives [here](https://github.com/apache/arrow-rs/compare/main...pydantic:arrow-rs:friendlymatthew/bench-validation-large-json). It uses the same logic in `test_json_to_variant_object_very_large` to build up the object. > > The validation logic on main regresses quite significantly when compared to the validation logic in this PR > > <img alt="Screenshot 2025-07-05 at 9 34 19 PM" width="859" src="https://private-user-images.githubusercontent.com/38759997/462853809-62515dbc-4bc9-4541-9b36-b22f0089fadb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTE3NzAzNDgsIm5iZiI6MTc1MTc3MDA0OCwicGF0aCI6Ii8zODc1OTk5Ny80NjI4NTM4MDktNjI1MTVkYmMtNGJjOS00NTQxLTliMzYtYjIyZjAwODlmYWRiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA3MDYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNzA2VDAyNDcyOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTcxZTk1MWUxNjQ5NjhkNjY4NzZmMzM2OTUxZjEzMDVhZmVhN2MzN2EzOWVkNjgzODIzNzczZGI1MjdhZWM1ZGImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.rUlGeKZriWYw9IcPYnPt6GCvPtrgyTzzORXZlL6H0JE"> We caught another bug where this time it was performing only a shallow validation from a method that wanted full validation. As @scovich points out, it'll slow this PR down because it now performs the linear checks that it guarantees. Still, we see the validation logic in this PR perform faster than the logic on main. A bit unfortunate but still a worthy improvement IMO. <img width="929" alt="Screenshot 2025-07-05 at 10 51 10 PM" src="https://github.com/user-attachments/assets/c69fb371-63a1-4575-ab66-7afeb775be0b" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
