[
https://issues.apache.org/jira/browse/ORC-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
William Hyun closed ORC-1116.
-----------------------------
> Csv-import tool exported field become empty
> -------------------------------------------
>
> Key: ORC-1116
> URL: https://issues.apache.org/jira/browse/ORC-1116
> Project: ORC
> Issue Type: Bug
> Components: tools
> Affects Versions: 1.7.3
> Reporter: kyle
> Assignee: kyle
> Priority: Minor
> Fix For: 1.7.4
>
> Attachments: CSVFileImport.dif
>
>
> When exporting a orc file with schema like "struct<a:string,b:binary>", if
> the data in column "b" has very long bytes (over 4MB), the process could
> segmentation fault or exported data in column "a" becomes empty string.
> Here is me trying to explain the code, maybe totally not correct, please bear
> with me.
> Following the code in CSVFileImport.cc, when writing to a orc file, all
> string type columns is using one databuffer inside function
> fillStringValues(). When one data length is larger than the buffer, the
> buffer will be resized. The resize() operation will cause all references and
> iterators into buffer.data() become invalid.
> In this case, when field "a" finished writing data into buffer, field "b"
> begin writing will resize the buffer, invalidate previous buffer.data(), so
> field "a"'s stringBatch pointing to buffer.data() is no longer valid.
> A workaround could use different databuffers for each string type column,
> however requires allocating 4MB memory each. (As the attached file) Or let
> all previous stringBatch re-points to new databuffer's address.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)