[ 
https://issues.apache.org/jira/browse/ARROW-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990318#comment-16990318
 ] 

Jacques Nadeau commented on ARROW-7342:
---------------------------------------

I'm not sure if there is a "right" (or at least well-specified) way. The Java 
perspective is empty vectors shouldn't have any data. There is no point in only 
having one offset since it doesn't mean anything. This also means communicating 
an empty vector on the wire is zero data as opposed to having to communicate 4 
bytes of useless data.

> [Java] offset buffer for vector of variable-width type with zero value count 
> is empty
> -------------------------------------------------------------------------------------
>
>                 Key: ARROW-7342
>                 URL: https://issues.apache.org/jira/browse/ARROW-7342
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: Steve M. Kim
>            Priority: Major
>
> I am reporting what I think might be two related bugs in 
> {{org.apache.arrow.vector.BaseVariableWidthVector}}
>  # The offset buffer is initialized as empty. I expect that it to have 4 
> bytes that represent the integer zero.
>  # The {{getBufferSize}} method returns 0 when value count is zero, instead 
> of 4.
> Compare to the pyarrow implementation, which I believe correctly populates 
> the offset buffer:
> {code:java}
> >>> import pyarrow as pa
> >>> array = pa.array([], type=pa.binary())
> >>> array <pyarrow.lib.BinaryArray object at 0x7f4f68b858e8>
> []
> >>> print([b.hex().decode() for b in array.buffers()])
> ['', '00000000', '']
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to