[ https://issues.apache.org/jira/browse/SPARK-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270847#comment-15270847 ]
Wes McKinney commented on SPARK-13946: -------------------------------------- {{import pyspark.sql.functions as F}} > PySpark DataFrames allows you to silently use aggregate expressions derived > from different table expressions > ------------------------------------------------------------------------------------------------------------ > > Key: SPARK-13946 > URL: https://issues.apache.org/jira/browse/SPARK-13946 > Project: Spark > Issue Type: Bug > Components: PySpark > Reporter: Wes McKinney > > In my opinion, this code should raise an exception rather than silently > discarding the predicate: > {code} > import numpy as np > import pandas as pd > df = pd.DataFrame({'foo': np.random.randn(1000000), > 'bar': np.random.randn(1000000)}) > sdf = sqlContext.createDataFrame(df) > sdf2 = sdf[sdf.bar > 0] > sdf.agg(F.count(sdf2.foo)).show() > +----------+ > |count(foo)| > +----------+ > | 1000000| > +----------+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org