arthurpassos commented on PR #35825:
URL: https://github.com/apache/arrow/pull/35825#issuecomment-1589345197

   > > > Could you make the CIs happy?
   > > 
   > > 
   > > As far as I can tell, CI is failing on `TestArrowParquet, 
LargeByteArray`. This test is failing because it depends on the data file added 
on this PR: 
[apache/parquet-testing#38](https://github.com/apache/parquet-testing/pull/38). 
Can you merge it?
   > 
   > Let's discuss the test here. I tried to write a simple roundtrip test to 
copy what you did: 
[apache/parquet-testing#38](https://github.com/apache/parquet-testing/pull/38). 
But unfortunately I got `std::bad_alloc` when creating a large string like 
`std::string(2 << 30, 'a')` on my laptop.
   > 
   > ```c++
   > TEST(TestArrowReadWrite, LargeBinaryRoundTrip) {
   >   auto pool = ::arrow::default_memory_pool();
   >   auto sink = CreateOutputStream();
   >   auto writer_properties = default_writer_properties();
   >   auto arrow_writer_properties = default_arrow_writer_properties();
   > 
   >   auto schema = ::arrow::schema(
   >       {::arrow::field("a", ::arrow::map(::arrow::utf8(), 
::arrow::int32()))});
   >   ASSERT_OK_AND_ASSIGN(auto writer,
   >                        FileWriter::Open(*schema, pool, sink, 
writer_properties));
   > 
   >   std::shared_ptr<Array> keys, values, offsets;
   >   ::arrow::ArrayFromVector<::arrow::StringType, std::string>(
   >       {true}, {std::string(2 << 30, 'a')}, &keys);
   >   ::arrow::ArrayFromVector<::arrow::Int32Type, int32_t>({true}, {1}, 
&values);
   >   ::arrow::ArrayFromVector<::arrow::Int32Type, int32_t>({0, 1}, &offsets);
   >   ASSERT_OK_AND_ASSIGN(auto map_array,
   >                        ::arrow::MapArray::FromArrays(offsets, keys, 
values));
   >   ASSERT_OK_AND_ASSIGN(
   >       auto chunked_array,
   >       ::arrow::ChunkedArray::Make({map_array, map_array}, 
schema->field(0)->type()));
   >   auto table = Table::Make(schema, {chunked_array}, /*num_rows=*/2);
   >   std::cout << table->ToString() << std::endl;
   > 
   >   ASSERT_OK_NO_THROW(writer->WriteTable(*table));
   >   ASSERT_OK_NO_THROW(writer->Close());
   >   ASSERT_OK_AND_ASSIGN(auto buffer, sink->Finish());
   > 
   >   auto read_properties = default_arrow_reader_properties();
   >   std::unique_ptr<FileReader> reader;
   >   ASSERT_OK(FileReader::Make(
   >       pool, 
ParquetFileReader::Open(std::make_shared<BufferReader>(buffer)),
   >       read_properties, &reader));
   > 
   >   std::shared_ptr<::arrow::Table> read_table;
   >   ASSERT_OK(reader->ReadTable(&read_table));
   >   std::cout << read_table->ToString() << std::endl;
   >   // ASSERT_TABLES_EQUAL(*table, *read_table);
   > }
   > ```
   
   I am running into the same issue on my side... Isn't there a way we can make 
the file work? It's much simpler, perfomatic & etc..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to