Zhang Mei created CARBONDATA-3225:
-------------------------------------
Summary: Can't save spark dataframe as carbon format In Windows
system
Key: CARBONDATA-3225
URL: https://issues.apache.org/jira/browse/CARBONDATA-3225
Project: CarbonData
Issue Type: New Feature
Components: core, hadoop-integration, spark-integration
Affects Versions: 1.5.1
Environment: Spark2.1.0, hadoop2.7.2, Carbon1.5.1
Reporter: Zhang Mei
When I try to save a dataframe as carbon format in Windows7 system using this
way :
carbonSession.createDataFrame(outputData.rdd, outputData.schema)
.write.format("carbon").mode(SaveMode.Overwrite).save(datapath)
The datapath is a s3a path.
Then I get this error:
2019-01-03 16:30:45 ERROR CarbonUtil:167 - Error while closing
stream:java.io.IOException: Stream Closed java.io.IOException: Stream Closed at
java.io.FileOutputStream.writeBytes(Native Method) at
java.io.FileOutputStream.write(FileOutputStream.java:326) at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at
java.io.DataOutputStream.flush(DataOutputStream.java:123) at
java.io.FilterOutputStream.close(FilterOutputStream.java:158) at
org.apache.carbondata.core.util.CarbonUtil.closeStream(CarbonUtil.java:181) at
org.apache.carbondata.core.util.CarbonUtil.closeStreams(CarbonUtil.java:165) at
org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.commitCurrentFile(AbstractFactDataWriter.java:272)
at
org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:383)
at
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:395)
at
org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.close(DataWriterProcessorStepImpl.java:251)
at
org.apache.carbondata.processing.loading.DataLoadExecutor.close(DataLoadExecutor.java:90)
at
org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$1.run(CarbonTableOutputFormat.java:275)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Suppressed: java.io.IOException:
Stream Closed at java.io.FileOutputStream.writeBytes(Native Method) at
java.io.FileOutputStream.write(FileOutputStream.java:326) at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at
java.io.FilterOutputStream.close(FilterOutputStream.java:158) at
java.io.FilterOutputStream.close(FilterOutputStream.java:159) ... 13 more
2019-01-03 16:30:45 INFO CarbonUtil:2733 - Copying
\Temp\/390621183953547_attempt_20190103163036_0004_m_000000_0\Fact\Part0\Segment_null\390621178232428\part-0-390621178232428_batchno0-0-null-390619717738707.carbondata
to
s3a://obs-test/zzz/obsCarbon_3/_temporary/0/_temporary/attempt_20190103163036_0004_m_000000_0,
operation id 1546504245269 2019-01-03 16:30:45 ERROR
CarbonTableOutputFormat:458 - Error while loading data
java.util.concurrent.ExecutionException: java.lang.NullPointerException at
java.util.concurrent.FutureTask.report(FutureTask.java:122) at
java.util.concurrent.FutureTask.get(FutureTask.java:192) at
org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$CarbonRecordWriter.close(CarbonTableOutputFormat.java:456)
at
org.apache.spark.sql.carbondata.execution.datasources.SparkCarbonFileFormat$CarbonOutputWriter.close(SparkCarbonFileFormat.scala:297)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:252)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:191)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at
org.apache.spark.scheduler.Task.run(Task.scala:99) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by:
java.lang.NullPointerException at
org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.closeExecutorService(AbstractFactDataWriter.java:428)
at
org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:390)
at
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:395)
at
org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.close(DataWriterProcessorStepImpl.java:251)
at
org.apache.carbondata.processing.loading.DataLoadExecutor.close(DataLoadExecutor.java:90)
at
org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$1.run(CarbonTableOutputFormat.java:275)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more 2019-01-03
16:30:45 ERROR Utils:91 - Aborting task java.lang.InterruptedException:
java.lang.NullPointerException at
org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$CarbonRecordWriter.close(CarbonTableOutputFormat.java:459)
at
org.apache.spark.sql.carbondata.execution.datasources.SparkCarbonFileFormat$CarbonOutputWriter.close(SparkCarbonFileFormat.scala:297)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:252)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:191)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at
org.apache.spark.scheduler.Task.run(Task.scala:99) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)