[
https://issues.apache.org/jira/browse/BEAM-12074?focusedWorklogId=579699&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579699
]
ASF GitHub Bot logged work on BEAM-12074:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Apr/21 02:54
Start Date: 09/Apr/21 02:54
Worklog Time Spent: 10m
Work Description: pcoet commented on a change in pull request #14382:
URL: https://github.com/apache/beam/pull/14382#discussion_r610300583
##########
File path: sdks/python/apache_beam/dataframe/frame_base.py
##########
@@ -356,6 +374,85 @@ def wrapper(*args, **kwargs):
return wrap
+BEAM_SPECIFIC = "Differences from pandas"
+
+SECTION_ORDER = [
+ 'Parameters',
+ 'Returns',
+ 'Raises',
+ BEAM_SPECIFIC,
+ 'See Also',
+ 'Notes',
+ 'Examples'
+]
+
+EXAMPLES_DISCLAIMER = (
+ "**NOTE:** These examples are pulled directly from the pandas
documentation "
+ "for convenience. The Beam DataFrame API will look different because it is
"
Review comment:
"The Beam DataFrame API" -> "Usage of the Beam DataFrame API"
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -1215,17 +1267,22 @@ def aggregate(self, func, axis=0, *args, **kwargs):
agg = aggregate
- applymap = frame_base._elementwise_method('applymap')
+ applymap = frame_base._elementwise_method('applymap', base=pd.DataFrame)
memory_usage = frame_base.wont_implement_method('non-deferred value')
info = frame_base.wont_implement_method('non-deferred value')
clip = frame_base._elementwise_method(
- 'clip', restrictions={'axis': lambda axis: axis in (0, 'index')})
+ 'clip', restrictions={'axis': lambda axis: axis in (0, 'index')},
base=pd.DataFrame)
+ @frame_base.with_docs_from(pd.DataFrame)
@frame_base.args_to_kwargs(pd.DataFrame)
@frame_base.populate_defaults(pd.DataFrame)
def corr(self, method, min_periods):
+ """Only ``method="pearson"`` can be parallelized, other methods require
Review comment:
"parallelized, other" -> "parallelized. Other"
##########
File path: sdks/python/apache_beam/dataframe/__init__.py
##########
@@ -14,4 +14,17 @@
# See the License for the specific language governing permissions and
# limitations under the License.
+"""Beam DataFrame API
+
+- For high-level documentation see
+ https://beam.apache.org/documentation/dsls/dataframes/overview/
+- :mod:`apache_beam.dataframe.io`: DataFrame I/Os
+- :mod:`apache_beam.dataframe.frames`: DataFrame operations
+- :mod:`apache_beam.dataframe.convert`: Conversion between
+ :class:`~apache_beam.pvalue.PCollection` and
+ :class:`~apache_beam.dataframe.frames.DeferredDataFrame`.
+- :mod:`apache_beam.dataframe.transforms`: Embed DataFrame operations in a
+ Beam pipeline.
Review comment:
"Embed DataFrame operations in a Beam pipeline." -> "DataFrame
operations for use in a Beam pipeline."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 579699)
Time Spent: 4h 10m (was: 4h)
> Add API Documentation for the DataFrame API
> -------------------------------------------
>
> Key: BEAM-12074
> URL: https://issues.apache.org/jira/browse/BEAM-12074
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Brian Hulette
> Priority: P2
> Labels: dataframe-api
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> For the most part this can be pulled directly from pandas methods, but we
> should add specific notes about any divergences that the Beam DataFrame API
> has from pandas.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)