[jira] [Resolved] (SPARK-22039) Spark 2.1.1 Driver OOM when use interaction for large scale Sparse Vector

Hyukjin Kwon (JIRA) Sat, 16 Sep 2017 08:27:15 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-22039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-22039.
----------------------------------
    Resolution: Invalid

Questions should go the mailing list. Let's start this on the mailing list 
first rather than describing this as a JIRA now.

> Spark 2.1.1 Driver OOM when use interaction for large scale Sparse Vector
> -------------------------------------------------------------------------
>
>                 Key: SPARK-22039
>                 URL: https://issues.apache.org/jira/browse/SPARK-22039
>             Project: Spark
>          Issue Type: Question
>          Components: ML
>    Affects Versions: 2.1.1
>            Reporter: wuhaibo
>
> I'm working on large scale logistic regression for ctr prediction, and when 
> user interaction for feature engineer, driver OOM. For detail, I interact 
> among userid(one-hot, 30w dimension, sparse) and base features(60 dimensions, 
> dense), driver memory is set to 40g.
> So, I try to debug from remote, and I find the spark interaction create a big 
> schema, and a lot job is doing at the driver.
> there is two question:
> By reading source, I found interaction is implemented with sparse vector, so 
> it does not need so much memory, and why it need do this at the driver. The 
> interaction result is 1800w dimension sparse dataframe, why 1800w structField 
> for schema is so big. this is dump file when the schema begins to create 
> because it is too big, I can't dump all: 
> https://i.stack.imgur.com/h0XBf.jpg
> So I implement interaction method with RDD, the job can finish in 5mim, so I 
> am wondering it's there any wrong here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-22039) Spark 2.1.1 Driver OOM when use interaction for large scale Sparse Vector

Reply via email to