[GitHub] [iceberg] openinx commented on a change in pull request #1299: Flink: support to RowData partition.

GitBox Mon, 10 Aug 2020 19:50:22 -0700


openinx commented on a change in pull request #1299:
URL: https://github.com/apache/iceberg/pull/1299#discussion_r468295897




##########
File path: flink/src/test/java/org/apache/iceberg/flink/RowDataConverter.java
##########
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.flink;
+
+import java.math.BigDecimal;
+import java.nio.ByteBuffer;
+import java.time.Instant;
+import java.time.LocalDate;
+import java.time.LocalDateTime;
+import java.time.LocalTime;
+import java.time.OffsetDateTime;
+import java.time.ZoneOffset;
+import java.time.temporal.ChronoUnit;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.GenericRowData;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.data.StringData;
+import org.apache.flink.table.data.TimestampData;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+class RowDataConverter {
+  private static final OffsetDateTime EPOCH = 
Instant.ofEpochSecond(0).atOffset(ZoneOffset.UTC);
+  private static final LocalDate EPOCH_DAY = EPOCH.toLocalDate();
+
+  private RowDataConverter() {
+  }
+
+  static RowData convert(Schema iSchema, Record record) {
+    return convert(iSchema.asStruct(), record);
+  }
+
+  private static RowData convert(Types.StructType struct, Record record) {
+    GenericRowData rowData = new GenericRowData(struct.fields().size());
+    List<Types.NestedField> fields = struct.fields();
+    for (int i = 0; i < fields.size(); i += 1) {
+      Types.NestedField field = fields.get(i);
+
+      Type fieldType = field.type();
+
+      switch (fieldType.typeId()) {
+        case STRUCT:
+          rowData.setField(i, convert(fieldType.asStructType(), 
record.get(i)));
+          break;
+        case LIST:
+          rowData.setField(i, convert(fieldType.asListType(), record.get(i)));
+          break;
+        case MAP:
+          rowData.setField(i, convert(fieldType.asMapType(), record.get(i)));
+          break;
+        default:
+          rowData.setField(i, convert(fieldType, record.get(i)));
+      }
+    }
+    return rowData;
+  }
+
+  private static Object convert(Type type, Object object) {
+    if (object == null) {
+      return null;
+    }
+
+    switch (type.typeId()) {
+      case BOOLEAN:
+      case INTEGER:
+      case LONG:
+      case FLOAT:
+      case DOUBLE:
+      case FIXED:
+        return object;
+      case DATE:
+        return (int) ChronoUnit.DAYS.between(EPOCH_DAY, (LocalDate) object);
+      case TIME:
+        // Iceberg's time is in microseconds, while flink's time is in 
milliseconds.
+        LocalTime localTime = (LocalTime) object;
+        return (int) TimeUnit.NANOSECONDS.toMillis(localTime.toNanoOfDay());

Review comment:
       For the same data with time type,  if flink write them into an iceberg 
table A, and hive MR or spark read it, in this case,  there should be no 
problem.  But for the same data set,  both flink and spark write them into 
difference tables A and B, then there should be difference between table A and 
B because of lost microseconds.  The differences sounds reasonable because of 
the different behavior from different compute engines.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on a change in pull request #1299: Flink: support to RowData partition.

Reply via email to