ASF GitHub Bot commented on ORC-101:

Github user omalley commented on the issue:

    Ok, the latest push has a few changes:
    * bloom_filter_utf8 streams use a new encoding with bytes instead of 
long[]. This is much more efficient for performance and storage size.
    * all column types now have bloom_filter_utf8 streams (largely to get the 
new representation)
    * the default is to just write the new bloom_filter_utf8 streams that old 
readers will ignore. There is an option to write both bloom_filter and 
bloom_filter_utf8 streams to support old readers.
    * there is an option for new readers to ignore the old bloom filters.
    * files generated after hive-12055 will correctly use the utf8 encoding 
even for the bloom_filter stream.

> Correct the use of the default charset in the bloomfilter
> ---------------------------------------------------------
>                 Key: ORC-101
>                 URL: https://issues.apache.org/jira/browse/ORC-101
>             Project: Orc
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
> Currently ORC's bloom filter depends on the default character set, which 
> isn't constant between computers.

This message was sent by Atlassian JIRA

Reply via email to