[
https://issues.apache.org/jira/browse/ARROW-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210341#comment-17210341
]
Andy Grove commented on ARROW-10226:
------------------------------------
{code:java}
part-00000-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00000-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00000-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00000-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00000-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 375000
bad values in batch
part-00000-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 49880
bad values in batch
part-00001-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00001-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00001-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00001-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00001-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 375000
bad values in batch
part-00001-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 49979
bad values in batch
part-00002-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00002-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00002-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00002-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00002-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 374998
bad values in batch
part-00002-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 50031
bad values in batch
part-00003-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00003-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00003-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00003-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 0 bad
values in batch
part-00003-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 375002
bad values in batch
part-00003-36eb4379-93a2-47a8-873a-d0f1ed13a85a-c000.snappy.parquet has 50110
bad values in batch {code}
> [Rust] [DataFusion] TPC-H query 1 no longer completes for 100GB dataset
> -----------------------------------------------------------------------
>
> Key: ARROW-10226
> URL: https://issues.apache.org/jira/browse/ARROW-10226
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust, Rust - DataFusion
> Reporter: Andy Grove
> Assignee: Andy Grove
> Priority: Major
> Fix For: 2.0.0
>
>
> I re-installed my desktop a few days ago (now using Ubuntu 20.04 LTS) and
> when I try and run the TPC-H benchmark, it never completes and eventually
> uses up all 64 GB RAM.
> I can run Spark against the data set and the query completes in 24 seconds,
> which IIRC is how long it took before.
> It is possible that something is odd on my environment, but it is also
> possible/likely that this is a real bug.
> I am investigating this and will update the Jira once I know more.
> I also went back to old commits that were working for me before and they show
> the same issue so I don't think this is related to a recent code change.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)