[jira] [Updated] (SPARK-38812) when i clean data ,I hope one rdd spill two rdd according clean data rule

gaokui (Jira) Sun, 17 Apr 2022 18:25:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-38812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


gaokui updated SPARK-38812:
---------------------------
    Description: 
when id do clean data,one rdd according one value(>or <) filter data, and then 
generate two different set，one is error data file， another is errorless data 
file.

Now I use filter, but this filter must have two spark dag job, that cost too 
much.

exactly some code like iterator.span(preidicate) and then return one 
tuple(iter1,iter2)

one dataset will be spilted tow dataset in one rule data clean progress.

i hope compute once not twice.

  was:
when id do clean data,one rdd according one value(>or <) filter data, and then 
generate two different set，one is error data file， another is errorless data 
file.

Now I use filter, but this filter must have two spark dag job, that cost too 
much.

exactly some code like iterator.span(preidicate) and then return one 
tuple(iter1,iter2)


> when i clean data ,I hope one rdd spill two rdd according clean data rule
> -------------------------------------------------------------------------
>
>                 Key: SPARK-38812
>                 URL: https://issues.apache.org/jira/browse/SPARK-38812
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 3.2.1
>            Reporter: gaokui
>            Priority: Major
>
> when id do clean data,one rdd according one value(>or <) filter data, and 
> then generate two different set，one is error data file， another is errorless 
> data file.
> Now I use filter, but this filter must have two spark dag job, that cost too 
> much.
> exactly some code like iterator.span(preidicate) and then return one 
> tuple(iter1,iter2)
> one dataset will be spilted tow dataset in one rule data clean progress.
> i hope compute once not twice.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-38812) when i clean data ,I hope one rdd spill two rdd according clean data rule

Reply via email to