Memory issues when trying to insert data in the form of ORC using Spark SQL

SRK Fri, 20 May 2016 15:43:45 -0700

Hi,

I see some memory issues when trying to insert the data in the form of ORC
using Spark SQL. Please find the query and exception below. Any idea as to
why this is happening?


sqlContext.sql("  CREATE EXTERNAL TABLE IF NOT EXISTS records (id STRING,
record STRING) PARTITIONED BY (datePartition STRING, idPartition STRING) 
stored as ORC LOCATION '/user/users' ")
      sqlContext.sql("  orc.compress= SNAPPY")
      sqlContext.sql(
        """ from recordsTemp ps   insert overwrite table users 
partition(datePartition , idPartition )  select ps.id, ps.record , 
ps.datePartition, ps.idPartition  """.stripMargin)


org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage
13.0org.apache.hadoop.hive.ql.metadata.HiveException:
parquet.hadoop.MemoryManager$1: New Memory allocation 1048575 bytes is
smaller than the minimum allocation size of 1048576 bytes.
        at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)
        at
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.org$apache$spark$sql$hive$SparkHiveDynamicPartitionWriterContainer$$newWriter$1(hiveWriterContainers.scala:240)
        at
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:249)
        at
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:249)
        at
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
        at
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.getLocalFileWriter(hiveWriterContainers.scala:249)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:112)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:104)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:104)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: parquet.hadoop.MemoryManager$1: New Memory allocation 1048575
bytes is smaller than the minimum allocation size of 1048576 bytes.
        at parquet.hadoop.MemoryManager.updateAllocation(MemoryManager.java:125)
        at parquet.hadoop.MemoryManager.addWriter(MemoryManager.java:82)
        at 
parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:104)
        at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303)
        at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:267)
        at
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.<init>(ParquetRecordWriterWrapper.java:65)
        at
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getParquerRecordWriterWrapper(MapredParquetOutputFormat.java:125)
        at
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:114)
        at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
        at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-issues-when-trying-to-insert-data-in-the-form-of-ORC-using-Spark-SQL-tp26988.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Memory issues when trying to insert data in the form of ORC using Spark SQL

Reply via email to