All,
   Dain Sundstrom pointed out to me in personal email that the ORC bloom
filters are currently using the default character encoding. That makes the
bloom filters non-portable between different computers that use different
default encodings. I've filed ORC-101 to address it, but I want to have a
wider discussion. I'd propose that we:

1. create a new WriterVersion for ORC-101.
2. move the bloom filter code from storage-api into ORC.
3. consistently use UTF-8 when creating new bloom filters
4. for ORC files older than ORC-101, test the default encoding instead of
UTF-8

Thoughts?

.. Owen

Reply via email to