EmilyMatt commented on PR #8930:
URL: https://github.com/apache/arrow-rs/pull/8930#issuecomment-3771604919

   > > The Avro file, readable by Spark: 
[bad-varint-bug.avro.gz](https://github.com/user-attachments/files/24724610/bad-varint-bug.avro.gz)
   > 
   > I have checked the Avro file is readable with Python avro 1.12.1:
   > 
   > ```
   > >>> from avro.datafile import DataFileReader
   > >>> from avro.io import DatumReader
   > >>> reader = DataFileReader(open("testing/data/avro/bad-varint-bug.avro", 
"rb"), DatumReader())
   > >>> for rec in reader:
   > ...     print(rec)
   > ... 
   > {'int_array': [1, 2]}
   > ```
   
   I don't think this is a bug in the async reader.
   You are using a testing infrastructure build around Arrow schemas with the 
reader schema in the metadata, but you did not provide the schema in yours.
   
   I can confirm the following test passes:
   ```
   #[tokio::test]
       async fn test_bad_varint_bug() {
           let file = arrow_test_data("avro/bad-varint-bug.avro");
   
           let store: Arc<dyn ObjectStore> = Arc::new(LocalFileSystem::new());
           let location = 
Path::from_filesystem_path("/home/emily/Downloads/bad-varint-bug.avro").unwrap();
   
           let file_size = store.head(&location).await.unwrap().size;
   
           let file_reader = AvroObjectReader::new(store, location);
           let reader = AsyncAvroFileReader::builder(file_reader, file_size, 
1024)
               .try_build()
               .await.unwrap();;
   
           let batches: Vec<RecordBatch> = reader.try_collect().await.unwrap();
           let batch = &batches[0];
           let int_list_col = batch.column(0).as_list::<i32>();
   
           let first_list = int_list_col.value(0);
           let expected_result = 
Arc::new(Int32Array::from_iter_values(vec![1i32, 2])) as _;
           assert_eq!(first_list, expected_result)
       }
   ```
   
   The issue is probably in the AvroSchema::from
   it has various bugs I've also encountered.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to