alberttwong commented on issue #10697:
URL: https://github.com/apache/hudi/issues/10697#issuecomment-1952981834
upgrading from hudi 0.11 to 0.14.1
```
[root@spark-hudi bin]# spark-shell --packages
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:0.14.1 --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'
--driver-memory 4G
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
(file:/spark-3.2.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.2.1.jar) to
constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
:: loading settings :: url =
jar:file:/spark-3.2.1-bin-hadoop3.2/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
org.apache.hudi#hudi-spark3.2-bundle_2.12 added as a dependency
:: resolving dependencies ::
org.apache.spark#spark-submit-parent-9b4a8c4b-e4e2-4b55-b29b-cacc399b9481;1.0
confs: [default]
found org.apache.hudi#hudi-spark3.2-bundle_2.12;0.14.1 in central
:: resolution report :: resolve 202ms :: artifacts dl 2ms
:: modules in use:
org.apache.hudi#hudi-spark3.2-bundle_2.12;0.14.1 from central in
[default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 1 | 0 |
---------------------------------------------------------------------
:: retrieving ::
org.apache.spark#spark-submit-parent-9b4a8c4b-e4e2-4b55-b29b-cacc399b9481
confs: [default]
0 artifacts copied, 1 already retrieved (0kB/7ms)
24/02/19 18:15:49 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
24/02/19 18:15:57 WARN Utils: Service 'SparkUI' could not bind on port 4041.
Attempting port 4042.
Spark context Web UI available at http://spark-hudi:4042
Spark context available as 'sc' (master = local[*], app id =
local-1708366558050).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.2.1
/_/
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.16.1)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions._
scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._
scala> import org.apache.spark.sql.Row
import org.apache.spark.sql.Row
scala> import org.apache.spark.sql.SaveMode._
import org.apache.spark.sql.SaveMode._
scala> import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceReadOptions._
scala> import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.DataSourceWriteOptions._
scala> import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.hudi.config.HoodieWriteConfig._
scala> import scala.collection.JavaConversions._
import scala.collection.JavaConversions._
scala>
scala> val df =
spark.read.parquet("s3a://huditest/user_behavior_sample_data.parquet")
df: org.apache.spark.sql.DataFrame = [UserID: bigint, ItemID: bigint ... 3
more fields]
scala>
scala> val databaseName = "hudi_sample"
databaseName: String = hudi_sample
scala> val tableName = "hudi_coders_hive"
tableName: String = hudi_coders_hive
scala> val basePath = "s3a://huditest/hudi_coders"
basePath: String = s3a://huditest/hudi_coders
scala>
scala> df.write.format("hudi").
| option(org.apache.hudi.config.HoodieWriteConfig.TABLE_NAME,
tableName).
| option(RECORDKEY_FIELD_OPT_KEY, "UserID").
| option(PRECOMBINE_FIELD_OPT_KEY, "UserID").
| option("hoodie.datasource.hive_sync.enable", "true").
| option("hoodie.datasource.hive_sync.mode", "hms").
| option("hoodie.datasource.hive_sync.database", databaseName).
| option("hoodie.datasource.hive_sync.table", tableName).
| option("hoodie.datasource.hive_sync.metastore.uris",
"thrift://hive-metastore:9083").
| option("fs.defaultFS", "s3://huditest/").
| mode(Overwrite).
| save(basePath)
warning: one deprecation; for details, enable `:setting -deprecation' or
`:replay -deprecation'
24/02/19 18:16:18 WARN HoodieSparkSqlWriterInternal: hoodie table at
s3a://huditest/hudi_coders already exists. Deleting existing data & overwriting
with new data.
24/02/19 18:16:21 WARN S3ABlockOutputStream: Application invoked the
Syncable API against stream writing to
hudi_coders/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.1_0-0-0.
This is unsupported
/spark/bin/spark-shell: line 47: 12322 Killed
"${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name
"Spark shell" "$@"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]