[
https://issues.apache.org/jira/browse/ARROW-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney closed ARROW-7383.
-------------------------------
Resolution: Not A Problem
> [C++] [Java] Cannot Read In Java/Scala Streaming Arrow Files Generated In C++
> -----------------------------------------------------------------------------
>
> Key: ARROW-7383
> URL: https://issues.apache.org/jira/browse/ARROW-7383
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Java
> Reporter: David Wilcox
> Priority: Major
> Attachments: cpparrow-stream.arrow, java-arrow-stream.arrow
>
>
> I'm working on a project to shuttle data back and forth from a java and c++
> process. I'm able to read the java stream object in C++. However, I'm unable
> to read the C++ stream object in my Java project. When I do, I get a problem
> that I can't deserialize the message.
> {code:java}
> Exception in thread "main" java.lang.IllegalArgumentExceptionException in
> thread "main" java.lang.IllegalArgumentException at
> java.nio.ByteBuffer.allocate(ByteBuffer.java:334) at
> org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:543)
> at
> org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$3.readNextBatch(ArrowConverters.scala:243)
> at
> org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$3.<init>(ArrowConverters.scala:229)
> at
> org.apache.spark.sql.execution.arrow.ArrowConverters$.getBatchesFromStream(ArrowConverters.scala:228)
> at
> org.apache.spark.sql.execution.arrow.ArrowConverters$$anonfun$readArrowStreamFromFile$2.apply(ArrowConverters.scala:216)
> at
> org.apache.spark.sql.execution.arrow.ArrowConverters$$anonfun$readArrowStreamFromFile$2.apply(ArrowConverters.scala:214)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) at
> org.apache.spark.sql.execution.arrow.ArrowConverters$.readArrowStreamFromFile(ArrowConverters.scala:214)
> at
> org.apache.spark.sql.execution.arrow.MyArrowReader.rawRead(MyArrowReader.scala:23)
> at com.adobe.compute.PlatformGet$.main(PlatformGet.scala:121) at
> com.adobe.compute.PlatformGet.main(PlatformGet.scala) {code}
> I checked out the code in MessageSerializer.java and it looks like it is
> expecting the read size to be created in the first four bytes of the file.
> Those four bytes are {{[-1, -1, -1, -1]}} in the C++ file, but in the Java
> file they are not. I do not know much about the internals of the Arrow
> streaming file format, so I can't say what it is supposed to be.
> I'm able to read the streaming file created in Java back in java (so Java is
> compatible with itself). The C++ code is able to read either the Java
> streaming file or the C++ Streaming file. Both work.
> Here's my C++ project to read and create a streaming object of its own.
>
> {code:java}
> arrow::Status Table__from_RecordBatchStreamReader(
> const std::shared_ptr<arrow::ipc::RecordBatchReader>& reader,
> std::shared_ptr<::arrow::Table>* table) {
> std::vector<std::shared_ptr<arrow::RecordBatch>> batches;
> shared_ptr<arrow::RecordBatch> cur;
> do {
> ARROW_RETURN_NOT_OK(reader->ReadNext(&cur));
> if (cur)
> batches.push_back(cur);
> } while (cur); return arrow::Table::FromRecordBatches(batches, table);
> }
> #define EXIT_ON_FAILURE(expr) \
> do { \
> arrow::Status status_ = (expr); \
> if (!status_.ok()) { \
> std::cerr << status_.message() << std::endl; \
> return EXIT_FAILURE; \
> } \
> } while (0);void checkStatus(arrow::Status st) {
> if (st.ok())
> return;
> std::cout << st.ToString() << std::endl;
> }int main(int , char** argv) {
> (void)argv;
> std::shared_ptr<arrow::io::InputStream> infile
> = *arrow::io::MemoryMappedFile::Open("/tmp/java-arrow-stream.arrow",
> arrow::io::FileMode::READ);
> std::shared_ptr<arrow::ipc::RecordBatchReader> reader;
> checkStatus(
> arrow::ipc::RecordBatchStreamReader::Open(infile, &reader));
> std::shared_ptr<arrow::Table> table;
> checkStatus(Table__from_RecordBatchStreamReader(reader, &table)); auto
> schema = reader->schema(); shared_ptr<arrow::io::OutputStream>
> outputStream;
> outputStream = *arrow::io::FileOutputStream::Open(
> "/tmp/cpparrow-stream.arrow");
> shared_ptr<arrow::ipc::RecordBatchWriter> recordBatchStreamWriter;
> checkStatus(arrow::ipc::RecordBatchStreamWriter::Open(
> outputStream.get(), schema, &recordBatchStreamWriter));
> checkStatus(recordBatchStreamWriter->WriteTable(*table));
> return 0;
> }
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)