[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and OverwritableTableSink

GitBox Wed, 19 Jun 2019 07:46:09 -0700

aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295328327


 ##########
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java
 ##########
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sinks;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * An abstract class with trait about partitionable table sink. This is mainly 
used for
+ * static partitions. For sql statement:
+ * <pre>
+ * <code>
+ * INSERT INTO A PARTITION(a='ab', b='cd') select c, d from B
+ * </code>
+ * </pre>
+ * We Assume the A has partition columns as &lt;a&gt;, &lt;b&gt;, &lt;c&gt;.
+ * The columns &lt;a&gt; and &lt;b&gt; are called static partition columns, 
while c is called
+ * dynamic partition column.
+ *
+ * <p>Note: Current class implementation don't support partition pruning which 
means constant
+ * partition columns will still be kept in result row.
+ */
+public interface PartitionableTableSink {
+
+       /**
+        * Get the partition keys of the table. This should be an empty set if 
the table is not partitioned.
+        *
+        * @return partition keys of the table
+        */
+       List<String> getPartitionKeys();
+
+       /**
+        * Sets the static partitions into the {@link TableSink}.
+        * @param partitions mapping from static partition column names to 
string literal values.
+        *                      String literals will be quoted using {@code '}, 
for example,
+        *                      value {@code abc} will be stored as {@code 
'abc'} with quotes around.
+        */
+       void setStaticPartitions(Map<String, String> partitions);
+
+       /**
+        * If true, all records would be sort with partition fields before 
output, for some sinks, this
+        * can be used to reduce the partition writers, that means the sink 
will accept data
+        * one partition at a time.
+        *
+        * <p>A sink should consider whether to override this especially when 
it needs buffer
+        * data before writing.
+        *
+        * <p>Notes:
+        * 1. If returns true, the output data will be sorted 
<strong>locally</strong> after partitioning.
+        * 2. Default returns true, if the table is partitioned.
+        */
+       default boolean sortLocalPartition() {
 
 Review comment:
   Is this a _requirement_ or a _request_, i.e. when this returns true does the 
data have to be sorted by partition and the sink would otherwise produce 
incorrect output or is it a nice request but the sink still works if the data 
is not sorted? We should put this clearer in the comment and the method name. 
I.e. `requiresPartitionGrouping()` or `canUsePartitionGrouping()`. (not sure on 
the name of the second one).
   
   Also, it's not really sorting by partitions but grouping by partitions, 
right? Which you can achieve using a sort. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and OverwritableTableSink

Reply via email to