[GitHub] [flink] javacaoyu commented on a change in pull request #19126: [FLINK-26609][python] Support sum operation in KeyedStream

GitBox Mon, 21 Mar 2022 01:23:36 -0700


javacaoyu commented on a change in pull request #19126:
URL: https://github.com/apache/flink/pull/19126#discussion_r830849153




##########
File path: flink-python/pyflink/datastream/data_stream.py
##########
@@ -1174,6 +1174,66 @@ def process_element(self, value, ctx: 
'KeyedProcessFunction.Context'):
         return self.process(FilterKeyedProcessFunctionAdapter(func), 
self._original_data_type_info)\
             .name("Filter")
 
+    def sum(self, position_to_sum: Union[int, str]) -> 'DataStream':
+        """
+        Applies an aggregation that gives a rolling sum of the data stream at 
the
+        given position grouped by the given key. An independent aggregate is 
kept
+        per key.
+
+        Example(Tuple data to sum):
+        ::
+
+            >>> ds = env.from_collection([('a', 1), ('a', 2), ('b', 1), ('b', 
5)])
+            >>> ds.key_by(lambda x: x[0]).sum(1)
+
+        Example(Row data to sum):
+        ::
+
+            >>> ds = env.from_collection([('a', 1), ('a', 2), ('a', 3), ('b', 
1), ('b', 2)],
+            ...                                
type_info=Types.ROW([Types.STRING(), Types.INT()]))
+            >>> ds.key_by(lambda x: x[0]).sum(1)
+
+        Example(Row data with fields name to sum):
+        ::
+
+            >>> ds = env.from_collection(
+            ...     [('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2)],
+            ...     type_info=Types.ROW_NAMED(["key", "value"], 
[Types.STRING(), Types.INT()])
+            ... )
+            >>> ds.key_by(lambda x: x[0]).sum("value")
+
+        :param position_to_sum:
+            The field position in the data points to sum, type can be int or 
str.
+            This is applicable to Tuple types, and :class:`pyflink.common.Row` 
types.
+        :return: The transformed DataStream.
+        """
+        if not isinstance(position_to_sum, int) and not 
isinstance(position_to_sum, str):
+            raise TypeError("The input must be of int or str type to locate 
the value to sum")
+
+        class SumReduceFunction(ReduceFunction):
+
+            def __init__(self, position_to_sum):
+                self._pos = position_to_sum
+
+            def reduce(self, value1, value2):
+                from numbers import Number
+                if not isinstance(value1[self._pos], Number):
+                    raise TypeError("The value to sum by given position must 
be of numeric type; "
+                                    f"actual {type(value1[self._pos])}, 
expected Number")
+                if isinstance(value1, tuple):

Review comment:
       Yes, list type need to be supported.
   
   The list type was supported in the oldest code. Later I thought that because 
the elements in the standard list type are all of the same type. 
   
   For this kind of list data: ['key', 1]   , its not a standard list to use.
   Although we can key by data[0], sum by data[1]. But for this data, using 
tuple may be better.
   So i delete the list type support.
   
   But, By your prompts, I think the choice should be left to the user.
   Whether the list used by the user is standard or not, that is the user's 
business.
   We just need to provide list type support.
   
   So, i will added the list type support.
   and thanks for your prompts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] javacaoyu commented on a change in pull request #19126: [FLINK-26609][python] Support sum operation in KeyedStream

Reply via email to