[
https://issues.apache.org/jira/browse/SPARK-21515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-21515.
-------------------------------
Resolution: Invalid
This is a question for StackOverflow or the mailing list.
https://spark.apache.org/contributing.html
> Spark ML Random Forest
> ----------------------
>
> Key: SPARK-21515
> URL: https://issues.apache.org/jira/browse/SPARK-21515
> Project: Spark
> Issue Type: Question
> Components: Build
> Affects Versions: 2.1.1
> Reporter: KovvuriSriRamaReddy
>
> We are reading data from flat file and storing in DataSet<Row>.
> We have one for loop, where we need to modify dataset and use it in next
> iteration. [ For first iteration we use original DataSet ]
> We all know that variable of type Dataset<Row> is immutable. But the scenario
> is, inside the for loop we perform some processing(Random Forest Spark ML)on
> this variable(Dataset<Type>) and use the updated result in the next
> iteration. This process continues until all the iterations are completed. [
> Size of the dataset is same, only values are changing ]
> Approach 1: we are storing the Intermediate result in new DataSet variable
> and using it in next iteration.What we have observed is, it took only 1sec to
> execute for loop 1st iteration and remaining iterations took more time
> exponentially. [ i.e 2nd iteration taking 70sec, 3rd iteration taking 90sec
> and so on...]
> Approach 2: Wrote intermediate DataSet into HDFS/ external file and read
> freshly for each iteration from HDFS/File,then each iteration gets completed
> more faster then previous approach.However, writing and reading data to/from
> HDFS/external file is taking more time.
> This is the problem we have which we need to fine tune.Could anyone please
> provide a better solution for this issue?
> Note: We are unpersisting & assigning NULL value to previous DataSets at the
> end of the loop.
> Thanks in advance.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]