Github user omalley commented on the issue: https://github.com/apache/orc/pull/299 I simplified the code to get the upper bound. I also added 4 test cases to test the different number of bytes in the utf-8 characters. See https://github.com/omalley/orc/tree/orc-203
---