Yes. The crazy thing about mesos running in fine grained mode is that there
is no way (correct me if I'm wrong) to set the number of cores per
executor. If one of my slaves on mesos has 32 cores, the fine grained mode
can allocate 32 cores on this executor for the job and if there are 32
tasks running on this executor at the same time, that is when the acquire
memory issue appears. Of course the 32 cores are dynamically allocated. So
mesos can take them back or put them in again depending on the cluster
utilization.

On Wed, Oct 21, 2015 at 5:13 PM, Reynold Xin <r...@databricks.com> wrote:

> Is this still Mesos fine grained mode?
>
>
> On Wed, Oct 21, 2015 at 1:16 PM, Jerry Lam <chiling...@gmail.com> wrote:
>
>> Hi guys,
>>
>> There is another memory issue. Not sure if this is related to Tungsten
>> this time because I have it disable (spark.sql.tungsten.enabled=false). It
>> happens more there are too many tasks running (300). I need to limit the
>> number of task to avoid this. The executor has 6G. Spark 1.5.1 is been used.
>>
>> Best Regards,
>>
>> Jerry
>>
>> org.apache.spark.SparkException: Task failed while writing rows.
>>      at 
>> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:393)
>>      at 
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
>>      at 
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
>>      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>      at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>      at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.io.IOException: Unable to acquire 67108864 bytes of memory
>>      at 
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
>>      at 
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:138)
>>      at 
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
>>      at 
>> org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:74)
>>      at 
>> org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:56)
>>      at 
>> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339)
>>
>>
>> On Tue, Oct 20, 2015 at 9:10 PM, Reynold Xin <r...@databricks.com> wrote:
>>
>>> With Jerry's permission, sending this back to the dev list to close the
>>> loop.
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Jerry Lam <chiling...@gmail.com>
>>> Date: Tue, Oct 20, 2015 at 3:54 PM
>>> Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ...
>>> To: Reynold Xin <r...@databricks.com>
>>>
>>>
>>> Yup, coarse grained mode works just fine. :)
>>> The difference is that by default, coarse grained mode uses 1 core per
>>> task. If I constraint 20 cores in total, there can be only 20 tasks running
>>> at the same time. However, with fine grained, I cannot set the total number
>>> of cores and therefore, it could be +200 tasks running at the same time (It
>>> is dynamic). So it might be the calculation of how much memory to acquire
>>> fail when the number of cores cannot be known ahead of time because you
>>> cannot make the assumption that X tasks running in an executor? Just my
>>> guess...
>>>
>>>
>>> On Tue, Oct 20, 2015 at 6:24 PM, Reynold Xin <r...@databricks.com>
>>> wrote:
>>>
>>>> Can you try coarse-grained mode and see if it is the same?
>>>>
>>>>
>>>> On Tue, Oct 20, 2015 at 3:20 PM, Jerry Lam <chiling...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Reynold,
>>>>>
>>>>> Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers
>>>>> but sometimes it does not. For one particular job, it failed all the time
>>>>> with the acquire-memory issue. I'm using spark on mesos with fine grained
>>>>> mode. Does it make a difference?
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Jerry
>>>>>
>>>>> On Tue, Oct 20, 2015 at 5:27 PM, Reynold Xin <r...@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> Jerry - I think that's been fixed in 1.5.1. Do you still see it?
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam <chiling...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I disabled it because of the "Could not acquire 65536 bytes of
>>>>>>> memory". It happens to fail the job. So for now, I'm not touching it.
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 4:48 PM, charmee <charm...@gmail.com> wrote:
>>>>>>>
>>>>>>>> We had disabled tungsten after we found few performance issues, but
>>>>>>>> had to
>>>>>>>> enable it back because we found that when we had large number of
>>>>>>>> group by
>>>>>>>> fields, if tungsten is disabled the shuffle keeps failing.
>>>>>>>>
>>>>>>>> Here is an excerpt from one of our engineers with his analysis.
>>>>>>>>
>>>>>>>> With Tungsten Enabled (default in spark 1.5):
>>>>>>>> ~90 files of 0.5G each:
>>>>>>>>
>>>>>>>> Ingest (after applying broadcast lookups) : 54 min
>>>>>>>> Aggregation (~30 fields in group by and another 40 in aggregation)
>>>>>>>> : 18 min
>>>>>>>>
>>>>>>>> With Tungsten Disabled:
>>>>>>>>
>>>>>>>> Ingest : 30 min
>>>>>>>> Aggregation : Erroring out
>>>>>>>>
>>>>>>>> On smaller tests we found that joins are slow with tungsten
>>>>>>>> enabled. With
>>>>>>>> GROUP BY, disabling tungsten is not working in the first place.
>>>>>>>>
>>>>>>>> Hope this helps.
>>>>>>>>
>>>>>>>> -Charmee
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14711.html
>>>>>>>> Sent from the Apache Spark Developers List mailing list archive at
>>>>>>>> Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Reply via email to