rdblue commented on a change in pull request #3023: URL: https://github.com/apache/iceberg/pull/3023#discussion_r717116519
########## File path: arrow/src/main/java/org/apache/iceberg/arrow/DictEncodedArrowConverter.java ########## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.arrow; + +import java.math.BigDecimal; +import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.FieldVector; +import org.apache.iceberg.arrow.vectorized.ArrowVectorAccessor; +import org.apache.iceberg.arrow.vectorized.VectorHolder; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.types.Type; +import org.apache.iceberg.types.Types; + +/** + * This converts dictionary encoded arrow vectors to a correctly typed arrow vector. + */ +public class DictEncodedArrowConverter { + + private DictEncodedArrowConverter() { + } + + public static FieldVector toArrowVector(VectorHolder vectorHolder, ArrowVectorAccessor<?, String, ?, ?> accessor) { + Preconditions.checkArgument(null != vectorHolder, "VectorHolder cannot be null"); + Preconditions.checkArgument(null != accessor, "ArrowVectorAccessor cannot be null"); + // TODO: add conversions for other types (https://github.com/apache/iceberg/issues/2484) + if (vectorHolder.isDictionaryEncoded()) { + if (Type.TypeID.DECIMAL.equals(vectorHolder.icebergType().typeId())) { + int precision = ((Types.DecimalType) vectorHolder.icebergType()).precision(); + int scale = ((Types.DecimalType) vectorHolder.icebergType()).scale(); + + DecimalVector decimalVector = new DecimalVector( + vectorHolder.vector().getName(), + ArrowSchemaUtil.convert(vectorHolder.icebergField()).getFieldType(), + vectorHolder.vector().getAllocator()); Review comment: I understand the need for `NestedField` in order to create the vector here. But I don't think that this should be allocating anything. It should instead copy from the `IntVector` to `DecimalVector`, both of which should be passed in. The current structure always passes vectors into readers to be filled with data. Sometimes those are reallocated, but we try to be able to pass the last set of vectors in to avoid allocation. Here should be the same. For example, if you're reading a table of `(int, string)` then we allocate a vector for each and pass them into the read method. If that int is actually a decimal, then we should create an appropriate decimal vector and pass that in as well so it can be reused through the same lifecycle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
