TheNeuralBit commented on a change in pull request #15074:
URL: https://github.com/apache/beam/pull/15074#discussion_r658371715
##########
File path: website/www/site/content/en/documentation/dsls/dataframes/overview.md
##########
@@ -112,22 +112,3 @@ pc1, pc2 = {'a': pc} | DataframeTransform(lambda a: expr1,
expr2)
{...} = {a: pc} | DataframeTransform(lambda a: {...})
{{< /highlight >}}
-
-## Differences from standard Pandas {#differences_from_standard_pandas}
-
-Beam DataFrames are deferred, like the rest of the Beam API. As a result,
there are some limitations on what you can do with Beam DataFrames, compared to
the standard Pandas implementation:
-
-* Because all operations are deferred, the result of a given operation may not
be available for control flow. For example, you can compute a sum, but you
can't branch on the result.
-* Result columns must be computable without access to the data. For example,
you can’t use
[transpose](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html).
-* PCollections in Beam are inherently unordered, so Pandas operations that are
sensitive to the ordering of rows are unsupported. For example, order-sensitive
operations such as
[shift](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html),
[cummax](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cummax.html),
[cummin](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cummin.html),
[head](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html),
and
[tail](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html#pandas.DataFrame.tail)
are not supported.
-
-With Beam DataFrames, computation doesn’t take place until the pipeline runs.
Before that, only the shape or schema of the result is known, meaning that you
can work with the names and types of the columns, but not the result data
itself.
-
-There are a few common exceptions you may see when attempting to use certain
Pandas operations:
-
-* **WontImplementError**: Indicates that this operation or argument isn’t
supported because it’s incompatible with the Beam model. The largest class of
operations that raise this error are order-sensitive operations.
-* **NotImplementedError**: Indicates this is an operation or argument that
hasn’t been implemented yet. Many Pandas operations are already available
through Beam DataFrames, but there’s still a long tail of unimplemented
operations.
-* **NonParallelOperation**: Indicates that you’re attempting a non-parallel
operation outside of an `allow_non_parallel_operations` block. Some operations
don't lend themselves to parallel computation. They can still be used, but must
be guarded in a `with beam.dataframe.allow_non_parallel_operations(True)` block.
-
-[pydoc_dataframe_transform]:
https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.transforms.html#apache_beam.dataframe.transforms.DataframeTransform
-[pydoc_sql_transform]:
https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.sql.html#apache_beam.transforms.sql.SqlTransform
Review comment:
I don't think you meant to remvoe these, looks like it broke some links:

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]