[jira] [Commented] (ARROW-6876) Reading parquet file becomes really slow for 0.15.0

Joris Van den Bossche (Jira) Mon, 14 Oct 2019 10:13:19 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951164#comment-16951164
 ]


Joris Van den Bossche commented on ARROW-6876:
----------------------------------------------

Thanks for the report. Would you be able to share a script that reproduces it 
(that writes a parquet file that has the issue, or otherwise share a file)? 
What's the schema of the data?

> Reading parquet file becomes really slow for 0.15.0
> ---------------------------------------------------
>
>                 Key: ARROW-6876
>                 URL: https://issues.apache.org/jira/browse/ARROW-6876
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.0
>         Environment: python3.7
>            Reporter: Bob
>            Priority: Major
>         Attachments: image-2019-10-14-18-10-42-850.png, 
> image-2019-10-14-18-12-07-652.png
>
>
> Hi,
>  
> I just noticed that reading a parquet file becomes really slow after I 
> upgraded to 0.15.0 when using pandas.
>  
> Example:
> *With 0.14.1*
> In [4]: %timeit df = pd.read_parquet(path)
> 2.02 s ± 47.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
> *With 0.15.0*
> In [5]: %timeit df = pd.read_parquet(path)
> 22.9 s ± 478 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>  
> The file is about 15MB in size. I am testing on the same machine using the 
> same version of python and pandas.
>  
> Have you received similar complain? What could be the issue here?
>  
> Thanks a lot.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6876) Reading parquet file becomes really slow for 0.15.0

Reply via email to