[ https://issues.apache.org/jira/browse/ORC-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507465#comment-15507465 ]
ASF GitHub Bot commented on ORC-101: ------------------------------------ Github user omalley commented on the issue: https://github.com/apache/orc/pull/60 Ok, the latest push has a few changes: * bloom_filter_utf8 streams use a new encoding with bytes instead of long[]. This is much more efficient for performance and storage size. * all column types now have bloom_filter_utf8 streams (largely to get the new representation) * the default is to just write the new bloom_filter_utf8 streams that old readers will ignore. There is an option to write both bloom_filter and bloom_filter_utf8 streams to support old readers. * there is an option for new readers to ignore the old bloom filters. * files generated after hive-12055 will correctly use the utf8 encoding even for the bloom_filter stream. > Correct the use of the default charset in the bloomfilter > --------------------------------------------------------- > > Key: ORC-101 > URL: https://issues.apache.org/jira/browse/ORC-101 > Project: Orc > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > > Currently ORC's bloom filter depends on the default character set, which > isn't constant between computers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)