arthurpassos commented on PR #35825: URL: https://github.com/apache/arrow/pull/35825#issuecomment-1589345197
> > > Could you make the CIs happy? > > > > > > As far as I can tell, CI is failing on `TestArrowParquet, LargeByteArray`. This test is failing because it depends on the data file added on this PR: [apache/parquet-testing#38](https://github.com/apache/parquet-testing/pull/38). Can you merge it? > > Let's discuss the test here. I tried to write a simple roundtrip test to copy what you did: [apache/parquet-testing#38](https://github.com/apache/parquet-testing/pull/38). But unfortunately I got `std::bad_alloc` when creating a large string like `std::string(2 << 30, 'a')` on my laptop. > > ```c++ > TEST(TestArrowReadWrite, LargeBinaryRoundTrip) { > auto pool = ::arrow::default_memory_pool(); > auto sink = CreateOutputStream(); > auto writer_properties = default_writer_properties(); > auto arrow_writer_properties = default_arrow_writer_properties(); > > auto schema = ::arrow::schema( > {::arrow::field("a", ::arrow::map(::arrow::utf8(), ::arrow::int32()))}); > ASSERT_OK_AND_ASSIGN(auto writer, > FileWriter::Open(*schema, pool, sink, writer_properties)); > > std::shared_ptr<Array> keys, values, offsets; > ::arrow::ArrayFromVector<::arrow::StringType, std::string>( > {true}, {std::string(2 << 30, 'a')}, &keys); > ::arrow::ArrayFromVector<::arrow::Int32Type, int32_t>({true}, {1}, &values); > ::arrow::ArrayFromVector<::arrow::Int32Type, int32_t>({0, 1}, &offsets); > ASSERT_OK_AND_ASSIGN(auto map_array, > ::arrow::MapArray::FromArrays(offsets, keys, values)); > ASSERT_OK_AND_ASSIGN( > auto chunked_array, > ::arrow::ChunkedArray::Make({map_array, map_array}, schema->field(0)->type())); > auto table = Table::Make(schema, {chunked_array}, /*num_rows=*/2); > std::cout << table->ToString() << std::endl; > > ASSERT_OK_NO_THROW(writer->WriteTable(*table)); > ASSERT_OK_NO_THROW(writer->Close()); > ASSERT_OK_AND_ASSIGN(auto buffer, sink->Finish()); > > auto read_properties = default_arrow_reader_properties(); > std::unique_ptr<FileReader> reader; > ASSERT_OK(FileReader::Make( > pool, ParquetFileReader::Open(std::make_shared<BufferReader>(buffer)), > read_properties, &reader)); > > std::shared_ptr<::arrow::Table> read_table; > ASSERT_OK(reader->ReadTable(&read_table)); > std::cout << read_table->ToString() << std::endl; > // ASSERT_TABLES_EQUAL(*table, *read_table); > } > ``` I am running into the same issue on my side... Isn't there a way we can make the file work? It's much simpler, perfomatic & etc.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
