Github user omalley commented on the issue:
https://github.com/apache/orc/pull/299
I simplified the code to get the upper bound. I also added 4 test cases to
test the different number of bytes in the utf-8 characters.
See https://github.com/omalley/orc/tree/orc-203---
