Hi,
  I have a spark cluster setup and I am trying to write the data to s3 but
in parquet format.
Here is what I am doing

df = sqlContext.load('test', 'com.databricks.spark.avro')

df.saveAsParquetFile("s3n://test")

But I get some nasty error:

Py4JJavaError: An error occurred while calling o29.saveAsParquetFile.

: org.apache.spark.SparkException: Job aborted.

at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:166)

at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:139)

at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)

at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)

at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)

at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)

at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)

at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)

at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)

at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950)

at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950)

at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:336)

at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)

at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)

at org.apache.spark.sql.DataFrame.saveAsParquetFile(DataFrame.scala:1508)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)

at py4j.Gateway.invoke(Gateway.java:259)

at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.GatewayConnection.run(GatewayConnection.java:207)

at java.lang.Thread.run(Thread.java:744)

Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task
3.3 in stage 0.0 (TID 12, srv-110-29.720.rdio):
org.apache.spark.SparkException: Task failed while writing rows.

at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:191)

at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160)

at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)

at org.apache.spark.scheduler.Task.run(Task.scala:70)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)

Caused by: java.lang.VerifyError: Bad type on operand stack

Exception Details:

  Location:


org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.initialize(Ljava/net/URI;Lorg/apache/hadoop/conf/Configuration;)V
@38: invokespecial

  Reason:

    Type 'org/jets3t/service/security/AWSCredentials' (current frame,
stack[3]) is not assignable to
'org/jets3t/service/security/ProviderCredentials'

  Current Frame:

    bci: @38

    flags: { }

    locals: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore',
'java/net/URI', 'org/apache/hadoop/conf/Configuration',
'org/apache/hadoop/fs/s3/S3Credentials',
'org/jets3t/service/security/AWSCredentials' }

    stack: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore',
uninitialized 32, uninitialized 32,
'org/jets3t/service/security/AWSCredentials' }

  Bytecode:

    0000000: bb00 3159 b700 324e 2d2b 2cb6 0034 bb00

    0000010: 3659 2db6 003a 2db6 003d b700 403a 042a

    0000020: bb00 4259 1904 b700 45b5 0047 a700 0b3a

    0000030: 042a 1904 b700 4f2a 2c12 5103 b600 55b5

    0000040: 0057 2a2c 1259 1400 5ab6 005f 1400 1eb8

    0000050: 0065 b500 672a 2c12 6914 001e b600 5f14

    0000060: 001e b800 65b5 006b 2a2c 126d b600 71b5

    0000070: 0073 2abb 0075 592b b600 78b7 007b b500

    0000080: 7db1

  Exception Handler Table:

    bci [14, 44] => handler: 47

  Stackmap Table:


full_frame(@47,{Object[#2],Object[#73],Object[#75],Object[#49]},{Object[#47]})

    same_frame(@55)



And in s3, I see something like test$folder?

I am not sure, how to fix this?

Any ideas?

Thanks

Reply via email to