Re: Error writing parquet to S3

2016-06-10 Thread Peter Halliday
Has anyone else seen this before?  Before when I saw this there was an OOM but 
doesn’t seem so.  Of course, I’m not sure how large the file that created this 
was either.

Peter 


> On Jun 9, 2016, at 9:00 PM, Peter Halliday  wrote:
> 
> I’m not 100% sure why I’m getting this.  I don’t see any errors before this 
> at all.  I’m not sure how to diagnose this.
> 
> 
> Peter Halliday
> 
> 
> 
> 2016-06-10 01:46:05,282] WARN org.apache.spark.scheduler.TaskSetManager 
> [task-result-getter-2hread] - Lost task 3737.0 in stage 2.0 (TID 10585, 
> ip-172-16-96-32.ec2.internal): org.apache.spark.SparkException: Task failed 
> while writing rows.
>   at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:414)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: The file being written is in an invalid 
> state. Probably caused by an error thrown previously. Current state: COLUMN
>   at 
> org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:146)
>   at 
> org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:138)
>   at 
> org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:195)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:153)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
>   at 
> org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
>   at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:405)
>   ... 8 more
> 



Error writing parquet to S3

2016-06-09 Thread Peter Halliday
I’m not 100% sure why I’m getting this.  I don’t see any errors before this at 
all.  I’m not sure how to diagnose this.


Peter Halliday



2016-06-10 01:46:05,282] WARN org.apache.spark.scheduler.TaskSetManager 
[task-result-getter-2hread] - Lost task 3737.0 in stage 2.0 (TID 10585, 
ip-172-16-96-32.ec2.internal): org.apache.spark.SparkException: Task failed 
while writing rows.
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:414)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: The file being written is in an invalid state. 
Probably caused by an error thrown previously. Current state: COLUMN
at 
org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:146)
at 
org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:138)
at 
org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:195)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:153)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:405)
... 8 more