[ 
https://issues.apache.org/jira/browse/ARROW-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li reassigned ARROW-7254:
-------------------------------

    Assignee: Liya Fan

> BaseVariableWidthVector#setSafe appears to make value offsets inconsistent
> --------------------------------------------------------------------------
>
>                 Key: ARROW-7254
>                 URL: https://issues.apache.org/jira/browse/ARROW-7254
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 0.15.1
>            Reporter: David Li
>            Assignee: Liya Fan
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following program writes a file which PyArrow either segfaults (0.14.1) 
> or rejects with an error (0.15.1) {{pyarrow.lib.ArrowInvalid: Column 0: 
> Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4}} 
> on reading.
> Calling {{setRowCount}} again, or calling {{setSafe}} with a higher index 
> fixes it. While it seems from the new documentation that we should (must?) 
> call {{VectorSchemaRoot#setRowCount}} at the end, I wouldn't have expected to 
> get an invalid file by calling using {{setSafe}}, either. 
> Full traceback:
> {noformat}
> > python3 -c 'import pyarrow as pa; 
> > print(pa.ipc.open_stream(open("./test.bin", "rb")).read_pandas())'
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File 
> "/Users/lidavidm/Flight/arrow-5137-auth/java/venv/lib/python3.7/site-packages/pyarrow/ipc.py",
>  line 46, in read_pandas
>     table = self.read_all()
>   File "pyarrow/ipc.pxi", line 330, in 
> pyarrow.lib._CRecordBatchReader.read_all
>   File "pyarrow/public-api.pxi", line 321, in pyarrow.lib.pyarrow_wrap_table
>   File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 
> inconsistent value_offsets for null slot0!=4
> {noformat}
>  
> Full program:
> {code:java}
> import java.io.OutputStream;
> import java.nio.charset.StandardCharsets;
> import java.nio.file.Files;
> import java.nio.file.Paths;
> import java.util.Collections;
> import org.apache.arrow.memory.BufferAllocator;
> import org.apache.arrow.memory.RootAllocator;
> import org.apache.arrow.vector.VarCharVector;
> import org.apache.arrow.vector.VectorSchemaRoot;
> import org.apache.arrow.vector.ipc.ArrowStreamWriter;
> import org.apache.arrow.vector.types.pojo.ArrowType;
> import org.apache.arrow.vector.types.pojo.Field;
> import org.apache.arrow.vector.types.pojo.Schema;
> public class AsdfTest {
>   public static void main(String[] args) throws Exception {
>     Schema schema = new Schema(Collections.singletonList(Field.nullable("a", 
> new ArrowType.Utf8())));
>     try (BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
>         VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator)) {
>       root.setRowCount(2);
>       VarCharVector v = (VarCharVector) root.getVector("a");
>       v.setSafe(0, "asdf".getBytes(StandardCharsets.UTF_8));
>       try (OutputStream output = 
> Files.newOutputStream(Paths.get("./test.bin"))) {
>         ArrowStreamWriter writer = new ArrowStreamWriter(root, null, output);
>         writer.writeBatch();
>         writer.close();
>       }
>     }
>   }
> }
> {code}
> {{v.setNull(1)}} after {{v.setSafe(0, "asdf")}} does not fix it. Using 
> {{set}} instead of {{setSafe}} will fail in Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to