1.5 has critical performance / bug issues, you’d better try 1.5.1 or 1.5.2rc version.
From: gen tang [mailto:gen.tan...@gmail.com] Sent: Thursday, November 5, 2015 12:43 PM To: dev@spark.apache.org Subject: Fwd: dataframe slow down with tungsten turn on Hi, In fact, I tested the same code with spark 1.5 with tungsten turning off. The result is quite the same as tungsten turning on. It seems that it is not the problem of tungsten, it is simply that spark 1.5 is slower than spark 1.4. Is there any idea about why it happens? Thanks a lot in advance Cheers Gen ---------- Forwarded message ---------- From: gen tang <gen.tan...@gmail.com<mailto:gen.tan...@gmail.com>> Date: Wed, Nov 4, 2015 at 3:54 PM Subject: dataframe slow down with tungsten turn on To: "u...@spark.apache.org<mailto:u...@spark.apache.org>" <u...@spark.apache.org<mailto:u...@spark.apache.org>> Hi sparkers, I am using dataframe to do some large ETL jobs. More precisely, I create dataframe from HIVE table and do some operations. And then I save it as json. When I used spark-1.4.1, the whole process is quite fast, about 1 mins. However, when I use the same code with spark-1.5.1(with tungsten turn on), it takes a about 2 hours to finish the same job. I checked the detail of tasks, almost all the time is consumed by computation. [https://owa.gf.com.cn/owa/service.svc/s/GetFileAttachment?id=AAMkAGEzNGJiN2Q4LTI2ODYtNGIyYS1hYWIyLTMzMTYxOGQzYTViNABGAAAAAACPuqp5iM6mRqg7wmvE6c8KBwBKGW%2B6dpgjRb4BfC%2BACXJIAAAAAAEPAABKGW%2B6dpgjRb4BfC%2BACXJIAAAAQcF3AAABEgAQAIeCeL7UEe9GhqECpYfXhDI%3D&X-OWA-CANARY=7U3OIyan90CkQzeCMSlDnFM6WrDs5NIIksHvCIBBNwcmtRNW4tO1_1WPFeb51C1IsASUo1jqj_A.] Any idea about why this happens? Thanks a lot in advance for your help. Cheers Gen