[
https://issues.apache.org/jira/browse/ORC-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495151#comment-15495151
]
ASF GitHub Bot commented on ORC-101:
------------------------------------
Github user prasanthj commented on a diff in the pull request:
https://github.com/apache/orc/pull/60#discussion_r79097969
--- Diff: java/core/src/java/org/apache/orc/util/BloomFilterIO.java ---
@@ -0,0 +1,71 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.orc.util;
+
+import org.apache.orc.OrcProto;
+
+public class BloomFilterIO {
+
+ private BloomFilterIO() {
+ // never called
+ }
+
+ /**
+ * Deserialize a bloom filter from the ORC file.
+ */
+ public static BloomFilter deserialize(OrcProto.Stream.Kind kind,
+ OrcProto.BloomFilter bloomFilter) {
+ if (bloomFilter == null) {
+ return null;
+ }
+ long values[] = new long[bloomFilter.getBitsetCount()];
+ for(int i=0; i < values.length; ++i) {
+ values[i] = bloomFilter.getBitset(i);
+ }
+ int numFuncs = bloomFilter.getNumHashFunctions();
+ switch (kind) {
+ case BLOOM_FILTER:
+ return new BloomFilter(values, numFuncs);
+ case BLOOM_FILTER_UTF8:
+ return new BloomFilterUtf8(values, numFuncs);
+ }
+ throw new IllegalArgumentException("Unknown bloom filter kind " +
kind);
+ }
+
+ /**
+ * Serialize the BloomFilter to the ORC file.
+ * @param builder the builder to write to
+ * @param bloomFilter the bloom filter to serialize
+ */
+ public static void serialize(OrcProto.BloomFilter.Builder builder,
+ BloomFilter bloomFilter) {
+ long[] bitset = bloomFilter.getBitSet();
+ if (builder.getBitsetCount() != bitset.length) {
+ builder.clear();
+ for(int i=0; i < bitset.length; ++i) {
+ builder.addBitset(bitset[i]);
--- End diff --
can we make this byte[]? Protobuf deserializes this as List<Long> IIRC
(Sergey and Dain reported).
> Correct the use of the default charset in the bloomfilter
> ---------------------------------------------------------
>
> Key: ORC-101
> URL: https://issues.apache.org/jira/browse/ORC-101
> Project: Orc
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
>
> Currently ORC's bloom filter depends on the default character set, which
> isn't constant between computers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)