[
https://issues.apache.org/jira/browse/BEAM-14213?focusedWorklogId=753056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753056
]
ASF GitHub Bot logged work on BEAM-14213:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Apr/22 18:55
Start Date: 05/Apr/22 18:55
Worklog Time Spent: 10m
Work Description: TheNeuralBit commented on code in PR #17253:
URL: https://github.com/apache/beam/pull/17253#discussion_r843159960
##########
sdks/python/apache_beam/typehints/batch_test.py:
##########
@@ -0,0 +1,111 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Unit tests for the batched type-hint objects."""
+
+import unittest
+
+import numpy as np
+import pandas as pd
+from parameterized import parameterized
+from parameterized import parameterized_class
+
+from apache_beam.typehints import row_type
+from apache_beam.typehints.batch import BatchConverter
+from apache_beam.typehints.batch import N
+from apache_beam.typehints.batch import NumpyArray
+from apache_beam.typehints.typehints import check_constraint
+from apache_beam.typehints.typehints import validate_composite_type_param
+
+
+@parameterized_class(
+ [{
+ 'batch_typehint': np.ndarray,
+ 'element_typehint': np.int32,
+ 'batch': np.array(range(100), np.int32)
+ },
+ {
+ 'batch_typehint': NumpyArray[np.int64, (N, 10)],
+ 'element_typehint': NumpyArray[np.int64, (10, )],
+ 'batch': np.array([list(range(i, i + 10)) for i in range(100)],
+ np.int64),
+ },
+ {
+ 'batch_typehint': pd.DataFrame,
+ 'element_typehint': row_type.RowTypeConstraint([
+ ('f_str', str), ('f_int64', np.int64), ('f_int32', np.int32)
+ ]),
+ 'batch': pd.DataFrame({
+ 'f_str': pd.Series(map(str, range(100)), dtype=pd.StringDtype()),
+ 'f_int64': pd.Series(range(100), dtype=np.int64),
+ 'f_int32': pd.Series(range(100), dtype=np.int32)
+ }),
+ }])
+class BatchTest(unittest.TestCase):
+ def setUp(self):
+ self.utils = BatchConverter.from_typehints(
+ element_type=self.element_typehint, batch_type=self.batch_typehint)
+
+ def equality_check(self, left, right):
+ if isinstance(left, np.ndarray) and isinstance(right, np.ndarray):
+ return np.array_equal(left, right)
+ elif isinstance(left, pd.DataFrame) and isinstance(right, pd.DataFrame):
+ return left.equals(right)
Review Comment:
I backed out the pandas DataFrame batchConverter (f909d92). Branch
https://github.com/TheNeuralBit/beam/tree/batched-dofn-pandas has this change
added back
Issue Time Tracking
-------------------
Worklog Id: (was: 753056)
Time Spent: 50m (was: 40m)
> Add support for Batched DoFns in the Python SDK
> -----------------------------------------------
>
> Key: BEAM-14213
> URL: https://issues.apache.org/jira/browse/BEAM-14213
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P2
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Add an implementation for https://s.apache.org/batched-dofns to the Python
> SDK.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)