alberttwong opened a new issue, #10696:
URL: https://github.com/apache/hudi/issues/10696
Using the NYC taxi dataset. I have no idea why it's asking about a ts
column since the parquet doesn't have that field.
```
atwong@Albert-CelerData Downloads % parquet-tools inspect
green_tripdata_2023-01.parquet
############ file meta data ############
created_by: parquet-cpp-arrow version 8.0.0
num_columns: 20
num_rows: 68211
num_row_groups: 1
format_version: 1.0
serialized_size: 10705
############ Columns ############
VendorID
lpep_pickup_datetime
lpep_dropoff_datetime
store_and_fwd_flag
RatecodeID
PULocationID
DOLocationID
passenger_count
trip_distance
fare_amount
extra
mta_tax
tip_amount
tolls_amount
ehail_fee
improvement_surcharge
total_amount
payment_type
trip_type
congestion_surcharge
```
```
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._
import scala.collection.JavaConversions._
val df = spark.read.parquet("s3://huditest/green_tripdata_2023-01.parquet")
val databaseName = "hudi_sample"
val tableName = "hudi_coders_hive"
val basePath = "s3a://huditest/hudi_coders"
df.write.format("hudi").
option(org.apache.hudi.config.HoodieWriteConfig.TABLE_NAME, tableName).
option(RECORDKEY_FIELD_OPT_KEY, "lpep_pickup_datetime").
option("hoodie.datasource.hive_sync.enable", "true").
option("hoodie.datasource.hive_sync.mode", "hms").
option("hoodie.datasource.hive_sync.database", databaseName).
option("hoodie.datasource.hive_sync.table", tableName).
option("hoodie.datasource.hive_sync.metastore.uris",
"thrift://hive-metastore:9083").
option("fs.defaultFS", "s3://huditest/").
mode(Overwrite).
save(basePath)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]