[
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hans van den Bogert updated SPARK-10474:
----------------------------------------
Comment: was deleted
(was: One more debug println for the calculated cores (in contrast to numCores):
https://gist.github.com/hansbogert/cc2baf3995d4e37270a2
Relevant output (output is the same for fine-grained as well as coarse-grained
mesos):
{noformat}
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.5.1
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0)
Type in expressions to have them evaluated.
Type :help for more information.
15/10/06 10:25:04 WARN SparkConf: In Spark 1.0 and later spark.local.dir will
be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in
mesos/standalone and LOCAL_DIRS in YARN).
numCores:0
cores:12
1048576
15/10/06 10:25:05 WARN MetricsSystem: Using default name DAGScheduler for
source because spark.app.id is not set.
...
{noformat}
The calculated 'cores' is 12, which the amount of cores of the local driver
node, however the total mesos cluster has more than 40 cores. Either way, there
is no difference between fine-grained and coarse grained mode, for this method
at least.)
> TungstenAggregation cannot acquire memory for pointer array after switching
> to sort-based
> -----------------------------------------------------------------------------------------
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.0
> Reporter: Yi Zhou
> Assignee: Andrew Or
> Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a Lost task happened with below error.
> {code}
> java.io.IOException: Could not acquire 65536 bytes of memory
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
> at
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:126)
> at
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379)
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Key SQL Query
> {code:sql}
> INSERT INTO TABLE test_table
> SELECT
> ss.ss_customer_sk AS cid,
> count(CASE WHEN i.i_class_id=1 THEN 1 ELSE NULL END) AS id1,
> count(CASE WHEN i.i_class_id=3 THEN 1 ELSE NULL END) AS id3,
> count(CASE WHEN i.i_class_id=5 THEN 1 ELSE NULL END) AS id5,
> count(CASE WHEN i.i_class_id=7 THEN 1 ELSE NULL END) AS id7,
> count(CASE WHEN i.i_class_id=9 THEN 1 ELSE NULL END) AS id9,
> count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11,
> count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13,
> count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15,
> count(CASE WHEN i.i_class_id=2 THEN 1 ELSE NULL END) AS id2,
> count(CASE WHEN i.i_class_id=4 THEN 1 ELSE NULL END) AS id4,
> count(CASE WHEN i.i_class_id=6 THEN 1 ELSE NULL END) AS id6,
> count(CASE WHEN i.i_class_id=8 THEN 1 ELSE NULL END) AS id8,
> count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10,
> count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14,
> count(CASE WHEN i.i_class_id=16 THEN 1 ELSE NULL END) AS id16
> FROM store_sales ss
> INNER JOIN item i ON ss.ss_item_sk = i.i_item_sk
> WHERE i.i_category IN ('Books')
> AND ss.ss_customer_sk IS NOT NULL
> GROUP BY ss.ss_customer_sk
> HAVING count(ss.ss_item_sk) > 5
> {code}
> Note:
> the store_sales is a big fact table and item is a small dimension table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]