[
https://issues.apache.org/jira/browse/BEAM-10814?focusedWorklogId=487252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487252
]
ASF GitHub Bot logged work on BEAM-10814:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 21/Sep/20 20:33
Start Date: 21/Sep/20 20:33
Worklog Time Spent: 10m
Work Description: TheNeuralBit commented on a change in pull request
#12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r492329417
##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -15,25 +15,129 @@
# limitations under the License.
#
-"""Utilities for relating schema-aware PCollections and dataframe transforms.
+r"""Utilities for relating schema-aware PCollections and dataframe transforms.
+
+pandas dtype Python typing
+np.int{8,16,32,64} <-----> np.int{8,16,32,64}*
+pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
+np.float{32,64} <-----> Optional[np.float{32,64}]
+ \--- np.float{32,64}
+np.dtype('S') <-----> bytes
+Not supported <------ Optional[bytes]
+np.bool <-----> np.bool
+
+* int, float, bool are treated the same as np.int64, np.float64, np.bool
+
+Any unknown or unsupported types are trested as Any and shunted to
+np.object:
+
+np.object <-----> Any
+
+Strings and nullable Booleans are handled differently when using pandas 0.x vs.
+1.x. pandas 0.x has no mapping for these types, so they are shunted lossily to
+ np.object.
+
+pandas 0.x:
+np.object <------ Optional[bool]
+ \--- Optional[str]
+ \-- str
+
+pandas 1.x:
+pd.BooleanDType() <-----> Optional[bool]
+pd.StringDType() <-----> Optional[str]
+ \--- str
+
+Pandas does not support hierarchical data natively. All structured types
Review comment:
SG, I added a sentence indicating we might add better support for these
types in the future.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 487252)
Time Spent: 1h 10m (was: 1h)
> DataframeTransform: when input is element-wise produce element-wise output
> --------------------------------------------------------------------------
>
> Key: BEAM-10814
> URL: https://issues.apache.org/jira/browse/BEAM-10814
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P2
> Fix For: 2.25.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> If input is DataFrame yield DataFrames, if it's elements yield elements.
> We should also provide a way to override the default
--
This message was sent by Atlassian Jira
(v8.3.4#803005)