[
https://issues.apache.org/jira/browse/HIVE-24151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on HIVE-24151 started by Ádám Szita.
-----------------------------------------
> MultiDelimitSerDe shifts data if strings contain non-ASCII characters
> ---------------------------------------------------------------------
>
> Key: HIVE-24151
> URL: https://issues.apache.org/jira/browse/HIVE-24151
> Project: Hive
> Issue Type: Bug
> Reporter: Ádám Szita
> Assignee: Ádám Szita
> Priority: Major
>
> HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last
> columns) but introduced a regression: the approach of the fix is pretty much
> all wrong, as the existing logic that operated on bytes got replaced by regex
> matcher logic which deals in character positions, rather than byte positions.
> As some non ASCII characters consist of more than 1 byte, the whole record
> may get shifted due to this.
> With this ticket I'm going to restore the old logic, and apply the proper fix
> on that, but keeping (and extending) the test cases added with HIVE-22360 so
> that we have a solution for both issues.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)