[jira] [Commented] (NIFI-1663) Add support for ORC format

ASF GitHub Bot (JIRA) Wed, 22 Jun 2016 08:22:28 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344485#comment-15344485
 ]


ASF GitHub Bot commented on NIFI-1663:
--------------------------------------

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/477#discussion_r68073525
  
    --- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/util/orc/OrcUtils.java
 ---
    @@ -0,0 +1,443 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.util.orc;
    +
    +import org.apache.avro.Schema;
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.commons.lang3.mutable.MutableInt;
    +import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
    +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
    +import org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector;
    +import org.apache.hadoop.hive.ql.exec.vector.ListColumnVector;
    +import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector;
    +import org.apache.hadoop.hive.ql.exec.vector.MapColumnVector;
    +import org.apache.hadoop.hive.ql.exec.vector.UnionColumnVector;
    +import org.apache.orc.TypeDescription;
    +
    +import java.nio.ByteBuffer;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Utility methods for ORC support (conversion from Avro, conversion to 
Hive types, e.g.
    + */
    +public class OrcUtils {
    +
    +    public static void putToRowBatch(ColumnVector col, MutableInt 
vectorOffset, int rowNumber, Schema fieldSchema, Object o) {
    +        Schema.Type fieldType = fieldSchema.getType();
    +
    +        if (fieldType == null) {
    +            throw new IllegalArgumentException("Field type is null");
    +        }
    +
    +        if (Schema.Type.INT.equals(fieldType)) {
    +            if (o == null) {
    --- End diff --
    
    The first several field types here check if o == null, and if so set 
isNull[rowNumber] = true. The others don't check if o == null and just perform 
(o instanceof) checks, which would throw a NullPointer. Should we move the `o 
== null` check to the beginning so that we always check it and so that the code 
is cleaner?


> Add support for ORC format
> --------------------------
>
>                 Key: NIFI-1663
>                 URL: https://issues.apache.org/jira/browse/NIFI-1663
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>             Fix For: 1.0.0
>
>
> From the Hive/ORC wiki 
> (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC): 
> The Optimized Row Columnar (ORC) file format provides a highly efficient way 
> to store Hive data ... Using ORC files improves performance when Hive is 
> reading, writing, and processing data.
> As users are interested in NiFi integrations with Hive (NIFI-981, NIFI-1193, 
> etc.), NiFi should be able to support ORC file format to enable users to 
> efficiently store flow files for use by Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-1663) Add support for ORC format

Reply via email to