raulcd commented on PR #49490:
URL: https://github.com/apache/arrow/pull/49490#issuecomment-4039299359
I have 64GB of RAM locally.
The `TestHugeProjector.SimpleTestSumHuge` from `gandiva-projector-test`
takes more than 15 minutes locally for me when running with a Debug build.
When running on release build it takes ~3 minutes but is failing, see:
```
[----------] 1 test from TestHugeFilter
[ RUN ] TestHugeFilter.TestSimpleHugeFilter
/home/raulcd/code/arrow/cpp/src/gandiva/tests/huge_table_test.cc:157: Failure
Value of: (exp)->Equals(selection_vector->ToArray(),
arrow::EqualOptions().nans_equal(true))
Actual: false
Expected: true
expected array: [
4,
5,
9,
11,
12,
13,
19,
21,
25,
26,
...
2147483625,
2147483627,
2147483629,
2147483630,
2147483636,
2147483637,
2147483641,
2147483643,
2147483644,
2147483645
] actual array: [
0,
1,
2,
3,
6,
7,
8,
10,
14,
15,
...
2147483634,
2147483635,
2147483638,
2147483639,
2147483640,
2147483642,
2147483646,
2147483647,
2147483648,
2147483649
]
[ FAILED ] TestHugeFilter.TestSimpleHugeFilter (153849 ms)
[----------] 1 test from TestHugeFilter (153850 ms total)
```
For the `parquet-arrow-reader-writer-test` the problem is with
`TestArrowReaderAdHoc.LargeStringColumn`, when running locally it takes on a
release build it takes ~10 minutes (haven't tested on debug but I am not sure I
want to :P)
```
[ RUN ] TestArrowReaderAdHoc.LargeStringColumn
[ OK ] TestArrowReaderAdHoc.LargeStringColumn (602823 ms)
```
The `parquet-writer-test` `TestColumnWriter.WriteLargeDictEncodedPage` and
`TestColumnWriter.ThrowsOnDictIndicesTooLarge` also fail locally for me:
```
[ RUN ] TestColumnWriter.WriteLargeDictEncodedPage
/home/raulcd/code/arrow/cpp/src/parquet/column_writer_test.cc:1100: Failure
Expected equality of these values:
page_count
Which is: 7501
2
[ FAILED ] TestColumnWriter.WriteLargeDictEncodedPage (2190 ms)
[ RUN ] TestColumnWriter.ThrowsOnDictIndicesTooLarge
/home/raulcd/code/arrow/cpp/src/parquet/column_writer_test.cc:1147: Failure
Expected: try { ([&]() { file_writer->Close(); })(); } catch (const
ParquetException& err) { switch (0) case 0: default: if (const
::testing::AssertionResult gtest_ar =
(::testing::internal::MakePredicateFormatterFromMatcher((::testing::Property(&ParquetException::what,
::testing::HasSubstr("exceeds maximum int value"))))("err", err))) ; else
::testing::internal::AssertHelper(::testing::TestPartResult::kNonFatalFailure,
"/home/raulcd/code/arrow/cpp/src/parquet/column_writer_test.cc", 1147,
gtest_ar.failure_message()) = ::testing::Message(); throw; } throws an
exception of type ParquetException.
Actual: it throws nothing.
[ FAILED ] TestColumnWriter.ThrowsOnDictIndicesTooLarge (23736 ms)
```
My takes from this. We can enable a job that test the memory large tests,
currently there seems to be some bugs on them, both for Gandiva and Parquet. We
probably want to run on CI with a release build, in order to shorten execution
time but even with that we will require like a 15 minutes timeout on individual
tests.
@pitrou what are your thoughts?
Should I open individual issues for those tests?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]