[ 
https://issues.apache.org/jira/browse/ORC-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495151#comment-15495151
 ] 

ASF GitHub Bot commented on ORC-101:
------------------------------------

Github user prasanthj commented on a diff in the pull request:

    https://github.com/apache/orc/pull/60#discussion_r79097969
  
    --- Diff: java/core/src/java/org/apache/orc/util/BloomFilterIO.java ---
    @@ -0,0 +1,71 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.orc.util;
    +
    +import org.apache.orc.OrcProto;
    +
    +public class BloomFilterIO  {
    +
    +  private BloomFilterIO() {
    +    // never called
    +  }
    +
    +  /**
    +   * Deserialize a bloom filter from the ORC file.
    +   */
    +  public static BloomFilter deserialize(OrcProto.Stream.Kind kind,
    +                                        OrcProto.BloomFilter bloomFilter) {
    +    if (bloomFilter == null) {
    +      return null;
    +    }
    +    long values[] = new long[bloomFilter.getBitsetCount()];
    +    for(int i=0; i < values.length; ++i) {
    +      values[i] = bloomFilter.getBitset(i);
    +    }
    +    int numFuncs = bloomFilter.getNumHashFunctions();
    +    switch (kind) {
    +      case BLOOM_FILTER:
    +        return new BloomFilter(values, numFuncs);
    +      case BLOOM_FILTER_UTF8:
    +        return new BloomFilterUtf8(values, numFuncs);
    +    }
    +    throw new IllegalArgumentException("Unknown bloom filter kind " + 
kind);
    +  }
    +
    +  /**
    +   * Serialize the BloomFilter to the ORC file.
    +   * @param builder the builder to write to
    +   * @param bloomFilter the bloom filter to serialize
    +   */
    +  public static void serialize(OrcProto.BloomFilter.Builder builder,
    +                               BloomFilter bloomFilter) {
    +    long[] bitset = bloomFilter.getBitSet();
    +    if (builder.getBitsetCount() != bitset.length) {
    +      builder.clear();
    +      for(int i=0; i < bitset.length; ++i) {
    +        builder.addBitset(bitset[i]);
    --- End diff --
    
    can we make this byte[]? Protobuf deserializes this as List<Long> IIRC 
(Sergey and Dain reported). 


> Correct the use of the default charset in the bloomfilter
> ---------------------------------------------------------
>
>                 Key: ORC-101
>                 URL: https://issues.apache.org/jira/browse/ORC-101
>             Project: Orc
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>
> Currently ORC's bloom filter depends on the default character set, which 
> isn't constant between computers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to