rui feng created HUDI-395:
-----------------------------
Summary: hudi does not support scheme s3n when wrtiing to S3
Key: HUDI-395
URL: https://issues.apache.org/jira/browse/HUDI-395
Project: Apache Hudi (incubating)
Issue Type: Bug
Components: Spark datasource
Environment: spark-2.4.4-bin-hadoop2.7
Reporter: rui feng
When I use Hudi to create a hudi table then write to s3, I used below maven
snnipet which is recommended by [https://hudi.apache.org/s3_hoodie.html]
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-spark-bundle</artifactId>
<version>0.5.0-incubating</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.10.34</version>
</dependency>
and add the below configuration:
sc.hadoopConfiguration.set("fs.defaultFS", "s3://niketest1")
sc.hadoopConfiguration.set("fs.s3.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "xxxxxx")
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "xxxxx")
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxxxxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxxxx")
my spark version is spark-2.4.4-bin-hadoop2.7 and when I run below
{color:#FF0000}df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath).{color}
val hudiOptions = Map[String,String](
HoodieWriteConfig.TABLE_NAME -> "hudi12",
DataSourceWriteOptions.OPERATION_OPT_KEY ->
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "rider",
DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY ->
DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
val hudiTablePath = "s3://niketest1/hudi_test/hudi12"
the exception occur:
j{color:#FF0000}ava.lang.IllegalArgumentException:
BlockAlignedAvroParquetWriter does not support scheme s3n{color}
at
org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.getHoodieScheme(HoodieWrapperFileSystem.java:109)
at
org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.convertToHoodiePath(HoodieWrapperFileSystem.java:85)
at
org.apache.hudi.io.storage.HoodieParquetWriter.<init>(HoodieParquetWriter.java:57)
at
org.apache.hudi.io.storage.HoodieStorageWriterFactory.newParquetStorageWriter(HoodieStorageWriterFactory.java:60)
at
org.apache.hudi.io.storage.HoodieStorageWriterFactory.getStorageWriter(HoodieStorageWriterFactory.java:44)
at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:70)
at
org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:137)
at
org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:125)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Is anyone can tell me what's cause this exception, I tried to use
org.apache.hadoop.fs.s3.S3FileSystem to replace
org.apache.hadoop.fs.s3native.NativeS3FileSystem for the conf "fs.s3.impl", but
other exception occur and it seems org.apache.hadoop.fs.s3.S3FileSystem fit
hadoop 2.6.
Thanks advance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)