[ 
https://issues.apache.org/jira/browse/SPARK-17973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-17973.
-------------------------------
    Resolution: Not A Problem

Questions should go on the mailing list: 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

I think you're looking for an operation like partition() in Scala. There isn't 
a way to do this. You can filter the Dataset twice, which ends up being about 
the same thing as you'd get with something like partition(). Either way the two 
child Datasets would have to evaluate the parent twice. You can cache the 
parent to avoid recomputing.

> is there any way to split Dataset into 2 or more based on the given condition
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-17973
>                 URL: https://issues.apache.org/jira/browse/SPARK-17973
>             Project: Spark
>          Issue Type: Question
>          Components: Java API
>            Reporter: sriram kumar
>            Priority: Critical
>
> i cannot able to split Dataset exactly with condition.  i have a scenario 
> where i need to split single Dataset into 4 dataset and non of matched to be 
> in 5th dataset. here bellow i am taking some baby steps. 
> this is my data. 
> +---------------+-----+--------------+----+----+
> |           Name|Class|          Dorm|Room| GPA|
> +---------------+-----+--------------+----+----+
> |Sally Whittaker| 2018|McCarren House| 312|3.75|
> |Belinda Jameson| 2017| Cushing House| 148|3.52|
> |     Jeff Smith| 2018|Prescott House|17-D| 3.2|
> |    Sandy Allen| 2019|  Oliver House| 108|3.48|
> +---------------+-----+--------------+----+----+
> Dataset<Row> s1 = s.selectExpr("upper(Name) as Name" , "Class");
> s1.filter("Class > 2017 and Room > 200").show();
> +---------------+-----+
> |           Name|Class|
> +---------------+-----+
> |SALLY WHITTAKER| 2018|
> +---------------+-----+
> then for what code can i get remaining data in the Dataset.?
> i tryed below one. but this going wrong. 
> s1.filter("!(Class > 2017 and Room > 200)").show();
>       
> +---------------+-----+
> |           Name|Class|
> +---------------+-----+
> |BELINDA JAMESON| 2017|
> |    SANDY ALLEN| 2019|
> +---------------+-----+
> i do got to know why it going wrong. but i don't get answer , how to get 
> those filter data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to