rui feng created HUDI-395:
-----------------------------

             Summary: hudi does not support scheme s3n when wrtiing to S3
                 Key: HUDI-395
                 URL: https://issues.apache.org/jira/browse/HUDI-395
             Project: Apache Hudi (incubating)
          Issue Type: Bug
          Components: Spark datasource
         Environment: spark-2.4.4-bin-hadoop2.7

            Reporter: rui feng


When I use Hudi to create a hudi table then write to s3, I used below maven 
snnipet which is recommended by [https://hudi.apache.org/s3_hoodie.html]

<dependency>
 <groupId>org.apache.hudi</groupId>
 <artifactId>hudi-spark-bundle</artifactId>
 <version>0.5.0-incubating</version>
</dependency>

<dependency>
 <groupId>org.apache.hadoop</groupId>
 <artifactId>hadoop-aws</artifactId>
 <version>2.7.3</version>
</dependency>
<dependency>
 <groupId>com.amazonaws</groupId>
 <artifactId>aws-java-sdk</artifactId>
 <version>1.10.34</version>
</dependency>

and add the below configuration:

sc.hadoopConfiguration.set("fs.defaultFS", "s3://niketest1")
 sc.hadoopConfiguration.set("fs.s3.impl", 
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
 sc.hadoopConfiguration.set("fs.s3n.impl", 
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
 sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "xxxxxx")
 sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "xxxxx")
 sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxxxxx")
 sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxxxx")

 

my spark version is spark-2.4.4-bin-hadoop2.7 and when I run below

{color:#FF0000}df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath).{color}

val hudiOptions = Map[String,String](
 HoodieWriteConfig.TABLE_NAME -> "hudi12",
 DataSourceWriteOptions.OPERATION_OPT_KEY -> 
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
 DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "rider",
 DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> 
DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)

val hudiTablePath = "s3://niketest1/hudi_test/hudi12"

the exception occur:

j{color:#FF0000}ava.lang.IllegalArgumentException: 
BlockAlignedAvroParquetWriter does not support scheme s3n{color}

 at 
org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.getHoodieScheme(HoodieWrapperFileSystem.java:109)

 at 
org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.convertToHoodiePath(HoodieWrapperFileSystem.java:85)

 at 
org.apache.hudi.io.storage.HoodieParquetWriter.<init>(HoodieParquetWriter.java:57)

 at 
org.apache.hudi.io.storage.HoodieStorageWriterFactory.newParquetStorageWriter(HoodieStorageWriterFactory.java:60)

 at 
org.apache.hudi.io.storage.HoodieStorageWriterFactory.getStorageWriter(HoodieStorageWriterFactory.java:44)

 at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:70)

 at 
org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:137)

 at 
org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:125)

 at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)

 at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)

 at java.util.concurrent.FutureTask.run(FutureTask.java:266)

 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

 at java.lang.Thread.run(Thread.java:748)

 

 

Is anyone can tell me what's cause this exception, I tried to use 
org.apache.hadoop.fs.s3.S3FileSystem to replace 
org.apache.hadoop.fs.s3native.NativeS3FileSystem for the conf "fs.s3.impl", but 
other exception occur and it seems org.apache.hadoop.fs.s3.S3FileSystem fit 
hadoop 2.6.

 

Thanks advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to