[ https://issues.apache.org/jira/browse/ORC-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495238#comment-15495238 ]
ASF GitHub Bot commented on ORC-101: ------------------------------------ Github user omalley commented on a diff in the pull request: https://github.com/apache/orc/pull/60#discussion_r79101181 --- Diff: java/core/src/java/org/apache/orc/OrcFile.java --- @@ -231,6 +232,33 @@ public static Reader createReader(Path path, void preFooterWrite(WriterContext context) throws IOException; } + public static enum BloomFilterVersion { + // Include both the BLOOM_FILTER and BLOOM_FILTER_UTF8 streams for string + // and decimal columns. + ORIGINAL("original"), + // Only include the BLOOM_FILTER_UTF8 for string and decimal columns. + // See ORC-101 + UTF8("utf8"); + + private final String id; + private BloomFilterVersion(String id) { + this.id = id; + } + + public String toString() { + return id; + } + + public static BloomFilterVersion fromString(String s) { + for (BloomFilterVersion version: values()) { + if (version.id.equals(s)) { + return version; + } + } + throw new IllegalArgumentException("Unknown BloomFilterVersion " + s); --- End diff -- I don't see how. To get the value, the code will always do this conversion after getting the string from OrcConf, so the error will be caught coming out. > Correct the use of the default charset in the bloomfilter > --------------------------------------------------------- > > Key: ORC-101 > URL: https://issues.apache.org/jira/browse/ORC-101 > Project: Orc > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > > Currently ORC's bloom filter depends on the default character set, which > isn't constant between computers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)