[
https://issues.apache.org/jira/browse/ORC-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495146#comment-15495146
]
ASF GitHub Bot commented on ORC-101:
------------------------------------
Github user prasanthj commented on a diff in the pull request:
https://github.com/apache/orc/pull/60#discussion_r79093894
--- Diff: java/core/src/java/org/apache/orc/OrcFile.java ---
@@ -231,6 +232,33 @@ public static Reader createReader(Path path,
void preFooterWrite(WriterContext context) throws IOException;
}
+ public static enum BloomFilterVersion {
+ // Include both the BLOOM_FILTER and BLOOM_FILTER_UTF8 streams for
string
+ // and decimal columns.
+ ORIGINAL("original"),
+ // Only include the BLOOM_FILTER_UTF8 for string and decimal columns.
+ // See ORC-101
+ UTF8("utf8");
+
+ private final String id;
+ private BloomFilterVersion(String id) {
+ this.id = id;
+ }
+
+ public String toString() {
+ return id;
+ }
+
+ public static BloomFilterVersion fromString(String s) {
+ for (BloomFilterVersion version: values()) {
+ if (version.id.equals(s)) {
+ return version;
+ }
+ }
+ throw new IllegalArgumentException("Unknown BloomFilterVersion " +
s);
--- End diff --
Can we do this validate in OrcConf? so that we don't wrong value here
> Correct the use of the default charset in the bloomfilter
> ---------------------------------------------------------
>
> Key: ORC-101
> URL: https://issues.apache.org/jira/browse/ORC-101
> Project: Orc
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
>
> Currently ORC's bloom filter depends on the default character set, which
> isn't constant between computers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)