amoeba commented on issue #39912:
URL: https://github.com/apache/arrow/issues/39912#issuecomment-1945317419

   Hi @chenyiwrites, is there any chance this is a public-available dataset? I 
think the best next step here is for us to reproduce your issue. 
Re-partitioning your dataset may make the issue go away but what you ran into 
is definitely a bug and it needs fixing.
   
   If you aren't able to share the dataset, could you give us a bit more 
information about the structure of it? Two bits of information would be useful:
   
   1. Your schema: Call `individual_positions$schema` and share the output.
   2. Statistics on your files and their number of rows and row groups. The 
output of the below will be long so let us know if every value is the same or 
what the range/distribution is.
       ```r
       num_rows <- vapply(ds$files, function(f) { 
ParquetFileReader$create(f)$num_rows }, 0, USE.NAMES=FALSE)
       num_row_groups <- vapply(ds$files, function(f) { 
ParquetFileReader$create(f)$num_row_groups }, 0, USE.NAMES=FALSE)
       ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to