[ https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335956#comment-15335956 ]
Nick Pentreath commented on SPARK-15447: ---------------------------------------- Finalized results in the linked Google sheet. Also posted raw results in two linked Google docs. [~mengxr] I didn't manage to run 1 billion ratings but did run 250mm (30mm users, 10mm items, 250mm ratings). I didn't see any potential performance regression issues for checkpointing changes (comparing RDD-based APIs between 2.0.0 and 1.6.1) or DF changes (comparing DF-based APIs between 2.0.0 and 1.6.1). I'm resolving this ticket, but let me know if you come up with any questions or concerns. > Performance test for ALS in Spark 2.0 > ------------------------------------- > > Key: SPARK-15447 > URL: https://issues.apache.org/jira/browse/SPARK-15447 > Project: Spark > Issue Type: Task > Components: ML > Affects Versions: 2.0.0 > Reporter: Xiangrui Meng > Assignee: Nick Pentreath > Priority: Critical > Labels: QA > > We made several changes to ALS in 2.0. It is necessary to run some tests to > avoid performance regression. We should test (synthetic) datasets from 1 > million ratings to 1 billion ratings. > cc [~mlnick] [~holdenk] Do you have time to run some large-scale performance > tests? > Links: > [Results > spreadsheet|https://docs.google.com/spreadsheets/d/1iX5LisfXcZSTCHp8VPoo5z-eCO85A5VsZDtZ5e475ks/edit?usp=sharing] > [Raw results for > SPARK-14891|https://docs.google.com/document/d/1tlWFCv8zWJuxv_gfAhd-57TKURVyrYkF9v4FLl4Lpn0/edit?usp=sharing] > [Raw results for > SPARK-6716|https://docs.google.com/document/d/12qLLX84Dg-XJAgoSQzmb0-bSncjTHhg7A_JJcQneDiE/edit?usp=sharing] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org