[ 
https://issues.apache.org/jira/browse/ARROW-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris V.Kuznetsov updated ARROW-6133:
-------------------------------------
    Description: 
Hello

My colleague and I are trying to pass Arrow thru Kafka. He uses a PyArrow, I'm 
using Scala Java API.

Here's the Transmitter code:
{code:python}
import pyarrow as pa
 def record_batch_to_bytes(df):
     batch = pa.RecordBatch.from_pandas(df)
     ser_ = pa.serialize(batch)
     return bytes(ser_.to_buffer())

{code}
 

My colleague is able to read this stream with the Python API:
{code:python}
def bytes_to_batch_record(bytes_):
     batch = pa.deserialize(bytes_)
     print(batch.schema)
{code}

On the Receiver side, I use the following from Java API:

{code:java}

def deserialize(din: Chunk[BArr]): Chunk[ArrowStreamReader] =
 for

{ arr <- din stream = new ByteArrayInputStream(arr) }

yield new ArrowStreamReader(stream, allocator)
  
 reader = deserialize(arr)
 schema = reader.map(r => r.getVectorSchemaRoot.getSchema)
 empty = reader.map(r => r.loadNextBatch)

{code}

 

Which fails with exception on both lines 2 and 3 in the last snippet:

Fiber failed.
 An unchecked error was produced.
 java.io.IOException: Unexpected end of input. Missing schema.
         at 
org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:135)
         at 
org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:178)
         at 
org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:169)
         at 
org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:62)
         at nettest.ArrowSpec.$anonfun$testConsumeArrow$7(Arrow.scala:96)
         at zio.Chunk$Arr.map(Chunk.scala:722)

 

The full Scala code is 
[here|https://github.com/Clover-Group/zio-tsp/blob/46e34c7c060bf4061067922077bbe05ea4b9f301/src/test/scala/Arrow.scala#L95]

 

How do I resolve that ? We both are using Arrow 0.14.1 and my colleague has no 
issues with PyArrow API.

Thank you!

  was:
Hello

My colleague and I are trying to pass Arrow thru Kafka. He uses a PyArrow, I'm 
using Scala Java API.

Here's the Transmitter code:
{code:java}
import pyarrow as pa
 def record_batch_to_bytes(df):
     batch = pa.RecordBatch.from_pandas(df)
     ser_ = pa.serialize(batch)
     return bytes(ser_.to_buffer())

{code}
 

My colleague is able to read this stream with the Python API:
{code}
def bytes_to_batch_record(bytes_):
     batch = pa.deserialize(bytes_)
     print(batch.schema)
{code}
 {code}
On the Receiver side, I use the following from Java API:

```java

def deserialize(din: Chunk[BArr]): Chunk[ArrowStreamReader] =
 for

{ arr <- din stream = new ByteArrayInputStream(arr) }

yield new ArrowStreamReader(stream, allocator)
  
 reader = deserialize(arr)
 schema = reader.map(r => r.getVectorSchemaRoot.getSchema)
 empty = reader.map(r => r.loadNextBatch)

```

 

Which fails with exception on both lines 2 and 3 in the last snippet:

Fiber failed.
 An unchecked error was produced.
 java.io.IOException: Unexpected end of input. Missing schema.
         at 
org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:135)
         at 
org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:178)
         at 
org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:169)
         at 
org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:62)
         at nettest.ArrowSpec.$anonfun$testConsumeArrow$7(Arrow.scala:96)
         at zio.Chunk$Arr.map(Chunk.scala:722)

 

The full Scala code is 
[here|https://github.com/Clover-Group/zio-tsp/blob/46e34c7c060bf4061067922077bbe05ea4b9f301/src/test/scala/Arrow.scala#L95]

 

How do I resolve that ? We both are using Arrow 0.14.1 and my colleague has no 
issues with PyArrow API.

Thank you!


> Schema Missing Exception in ArrowStreamReader
> ---------------------------------------------
>
>                 Key: ARROW-6133
>                 URL: https://issues.apache.org/jira/browse/ARROW-6133
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 0.14.1
>            Reporter: Boris V.Kuznetsov
>            Priority: Major
>
> Hello
> My colleague and I are trying to pass Arrow thru Kafka. He uses a PyArrow, 
> I'm using Scala Java API.
> Here's the Transmitter code:
> {code:python}
> import pyarrow as pa
>  def record_batch_to_bytes(df):
>      batch = pa.RecordBatch.from_pandas(df)
>      ser_ = pa.serialize(batch)
>      return bytes(ser_.to_buffer())
> {code}
>  
> My colleague is able to read this stream with the Python API:
> {code:python}
> def bytes_to_batch_record(bytes_):
>      batch = pa.deserialize(bytes_)
>      print(batch.schema)
> {code}
> On the Receiver side, I use the following from Java API:
> {code:java}
> def deserialize(din: Chunk[BArr]): Chunk[ArrowStreamReader] =
>  for
> { arr <- din stream = new ByteArrayInputStream(arr) }
> yield new ArrowStreamReader(stream, allocator)
>   
>  reader = deserialize(arr)
>  schema = reader.map(r => r.getVectorSchemaRoot.getSchema)
>  empty = reader.map(r => r.loadNextBatch)
> {code}
>  
> Which fails with exception on both lines 2 and 3 in the last snippet:
> Fiber failed.
>  An unchecked error was produced.
>  java.io.IOException: Unexpected end of input. Missing schema.
>          at 
> org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:135)
>          at 
> org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:178)
>          at 
> org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:169)
>          at 
> org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:62)
>          at nettest.ArrowSpec.$anonfun$testConsumeArrow$7(Arrow.scala:96)
>          at zio.Chunk$Arr.map(Chunk.scala:722)
>  
> The full Scala code is 
> [here|https://github.com/Clover-Group/zio-tsp/blob/46e34c7c060bf4061067922077bbe05ea4b9f301/src/test/scala/Arrow.scala#L95]
>  
> How do I resolve that ? We both are using Arrow 0.14.1 and my colleague has 
> no issues with PyArrow API.
> Thank you!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to