All, Dain Sundstrom pointed out to me in personal email that the ORC bloom filters are currently using the default character encoding. That makes the bloom filters non-portable between different computers that use different default encodings. I've filed ORC-101 to address it, but I want to have a wider discussion. I'd propose that we:
1. create a new WriterVersion for ORC-101. 2. move the bloom filter code from storage-api into ORC. 3. consistently use UTF-8 when creating new bloom filters 4. for ORC files older than ORC-101, test the default encoding instead of UTF-8 Thoughts? .. Owen