Is this still Mesos fine grained mode?
On Wed, Oct 21, 2015 at 1:16 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi guys, > > There is another memory issue. Not sure if this is related to Tungsten > this time because I have it disable (spark.sql.tungsten.enabled=false). It > happens more there are too many tasks running (300). I need to limit the > number of task to avoid this. The executor has 6G. Spark 1.5.1 is been used. > > Best Regards, > > Jerry > > org.apache.spark.SparkException: Task failed while writing rows. > at > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:393) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Unable to acquire 67108864 bytes of memory > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:138) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:74) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:56) > at > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339) > > > On Tue, Oct 20, 2015 at 9:10 PM, Reynold Xin <r...@databricks.com> wrote: > >> With Jerry's permission, sending this back to the dev list to close the >> loop. >> >> >> ---------- Forwarded message ---------- >> From: Jerry Lam <chiling...@gmail.com> >> Date: Tue, Oct 20, 2015 at 3:54 PM >> Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ... >> To: Reynold Xin <r...@databricks.com> >> >> >> Yup, coarse grained mode works just fine. :) >> The difference is that by default, coarse grained mode uses 1 core per >> task. If I constraint 20 cores in total, there can be only 20 tasks running >> at the same time. However, with fine grained, I cannot set the total number >> of cores and therefore, it could be +200 tasks running at the same time (It >> is dynamic). So it might be the calculation of how much memory to acquire >> fail when the number of cores cannot be known ahead of time because you >> cannot make the assumption that X tasks running in an executor? Just my >> guess... >> >> >> On Tue, Oct 20, 2015 at 6:24 PM, Reynold Xin <r...@databricks.com> wrote: >> >>> Can you try coarse-grained mode and see if it is the same? >>> >>> >>> On Tue, Oct 20, 2015 at 3:20 PM, Jerry Lam <chiling...@gmail.com> wrote: >>> >>>> Hi Reynold, >>>> >>>> Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers but >>>> sometimes it does not. For one particular job, it failed all the time with >>>> the acquire-memory issue. I'm using spark on mesos with fine grained mode. >>>> Does it make a difference? >>>> >>>> Best Regards, >>>> >>>> Jerry >>>> >>>> On Tue, Oct 20, 2015 at 5:27 PM, Reynold Xin <r...@databricks.com> >>>> wrote: >>>> >>>>> Jerry - I think that's been fixed in 1.5.1. Do you still see it? >>>>> >>>>> On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam <chiling...@gmail.com> >>>>> wrote: >>>>> >>>>>> I disabled it because of the "Could not acquire 65536 bytes of >>>>>> memory". It happens to fail the job. So for now, I'm not touching it. >>>>>> >>>>>> On Tue, Oct 20, 2015 at 4:48 PM, charmee <charm...@gmail.com> wrote: >>>>>> >>>>>>> We had disabled tungsten after we found few performance issues, but >>>>>>> had to >>>>>>> enable it back because we found that when we had large number of >>>>>>> group by >>>>>>> fields, if tungsten is disabled the shuffle keeps failing. >>>>>>> >>>>>>> Here is an excerpt from one of our engineers with his analysis. >>>>>>> >>>>>>> With Tungsten Enabled (default in spark 1.5): >>>>>>> ~90 files of 0.5G each: >>>>>>> >>>>>>> Ingest (after applying broadcast lookups) : 54 min >>>>>>> Aggregation (~30 fields in group by and another 40 in aggregation) : >>>>>>> 18 min >>>>>>> >>>>>>> With Tungsten Disabled: >>>>>>> >>>>>>> Ingest : 30 min >>>>>>> Aggregation : Erroring out >>>>>>> >>>>>>> On smaller tests we found that joins are slow with tungsten enabled. >>>>>>> With >>>>>>> GROUP BY, disabling tungsten is not working in the first place. >>>>>>> >>>>>>> Hope this helps. >>>>>>> >>>>>>> -Charmee >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14711.html >>>>>>> Sent from the Apache Spark Developers List mailing list archive at >>>>>>> Nabble.com. >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> >