[ 
https://issues.apache.org/jira/browse/ORC-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507465#comment-15507465
 ] 

ASF GitHub Bot commented on ORC-101:
------------------------------------

Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/60
  
    Ok, the latest push has a few changes:
    * bloom_filter_utf8 streams use a new encoding with bytes instead of 
long[]. This is much more efficient for performance and storage size.
    * all column types now have bloom_filter_utf8 streams (largely to get the 
new representation)
    * the default is to just write the new bloom_filter_utf8 streams that old 
readers will ignore. There is an option to write both bloom_filter and 
bloom_filter_utf8 streams to support old readers.
    * there is an option for new readers to ignore the old bloom filters.
    * files generated after hive-12055 will correctly use the utf8 encoding 
even for the bloom_filter stream.


> Correct the use of the default charset in the bloomfilter
> ---------------------------------------------------------
>
>                 Key: ORC-101
>                 URL: https://issues.apache.org/jira/browse/ORC-101
>             Project: Orc
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>
> Currently ORC's bloom filter depends on the default character set, which 
> isn't constant between computers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to