[ 
https://issues.apache.org/jira/browse/ARROW-18064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620150#comment-17620150
 ] 

Miles Granger commented on ARROW-18064:
---------------------------------------

Thanks for the report.

Initially testing given the example {{badplug.parquet}} I note that it also 
fails with {{{}fastparquet==0.8.3{}}}:

{{{}ValueError: could not broadcast input array from shape (512,) into shape 
(511,)

{}}}And pyarrow:
{{ArrowInvalid: Column 23 named CarrierID expected length 511 but got length 
512}}

I'll take some more time to look at this. 
Do you any code which can generate an example parquet file which reproduces 
this? Or information about how it was generated?

> Error of wrong number of rows read from file
> --------------------------------------------
>
>                 Key: ARROW-18064
>                 URL: https://issues.apache.org/jira/browse/ARROW-18064
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 7.0.0, 7.0.1, 8.0.0, 8.0.1, 9.0.0
>         Environment: Python Info
> 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit 
> (AMD64)]
> Pyarrow Info
> 6.0.1
> Platform Info
> Windows-10-10.0.19042-SP0
> Windows
> 10
> 10.0.19042
> 19042
> AMD64
>            Reporter: Blake erickson
>            Assignee: Miles Granger
>            Priority: Major
>         Attachments: badplug.parquet, readBadParquet.py
>
>
> on version greater than 6.0.1 fail to read tables saying expected length n, 
> got n=1 rows
>  
> Tables can be read column by column fine, or with a fixed number of rows 
> matching the meta data fine.      Reads correctly in version 6.0.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to