[jira] [Commented] (ARROW-10226) [Rust] [Parquet] Parquet reader reading wrong columns in some batches within a parquet file

Josh Taylor (Jira) Sat, 10 Oct 2020 22:29:31 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211833#comment-17211833
 ]


Josh Taylor commented on ARROW-10226:
-------------------------------------

I'm seeing the same issue of the initial title, which was that it never 
completes.

Test file: 
[https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing]

(This is from snowflakes example data, exported as a single file parquet file, 
same thing happens for many files).

Code that fails (both group by with sum of columns and the builder pattern 
doesn't work):

https://github.com/joshuataylor/parquet-group-by/blob/main/src/main.rs

> [Rust] [Parquet] Parquet reader reading wrong columns in some batches within 
> a parquet file
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-10226
>                 URL: https://issues.apache.org/jira/browse/ARROW-10226
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust, Rust - DataFusion
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Major
>             Fix For: 2.0.0
>
>
> I re-installed my desktop a few days ago (now using Ubuntu 20.04 LTS)  and 
> when I try and run the TPC-H benchmark, it never completes and eventually 
> uses up all 64 GB RAM.
> I can run Spark against the data  set and the query completes in 24 seconds, 
> which IIRC is how long it took before.
> It is possible that something is odd on my environment, but it is also 
> possible/likely that this is a real bug.
> I am investigating this and will update the Jira once I know more.
> I also went back to old commits that were working for me before and they show 
> the same issue so I don't think this is related to a recent code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10226) [Rust] [Parquet] Parquet reader reading wrong columns in some batches within a parquet file

Reply via email to