nsivabalan edited a comment on issue #4784:
URL: https://github.com/apache/hudi/issues/4784#issuecomment-1036330505
I could not reproduce the partitioning issue you are facing. I could see my
partition is well formed and I could see the two original columns which i used
to generate the partition col as well.
local spark shell
```
import java.sql.Timestamp
import spark.implicits._
import org.apache.hudi.QuickstartUtils._
import scala.collection.JavaConversions._
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._
val df1 = Seq(
("row1", 1, "part1" ,1578283932000L ),
("row2", 1, "part1", 1578283942000L)
).toDF("row", "ppath", "preComb","eventTime")
df1.write.format("hudi").
options(getQuickstartWriteConfigs).
option(PRECOMBINE_FIELD_OPT_KEY, "preComb").
option(RECORDKEY_FIELD_OPT_KEY, "row").
option(PARTITIONPATH_FIELD_OPT_KEY,
"preComb:simple,ppath:timestamp").
option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.CustomKeyGenerator").
option("hoodie.deltastreamer.keygen.timebased.timestamp.type","EPOCHMILLISECONDS").
option("hoodie.deltastreamer.keygen.timebased.output.dateformat","yyyy-MM-dd").
option("hoodie.deltastreamer.keygen.timebased.timezone","GMT+8:00").
option(TABLE_NAME, "timestamp_tbl4").
mode(Overwrite).
save("/tmp/hudi_timestamp_tbl4")
val hudiDF4 = spark.read.format("hudi").load("/tmp/hudi_timestamp_tbl4")
hudiDF4.registerTempTable("tbl4")
spark.sql("describe tbl4").show()
spark.sql("select * from tbl4 limit 3").show()
```
Output
```
spark.sql("select * from tbl4 limit 3").show()
+-------------------+--------------------+------------------+----------------------+--------------------+----+-------------+-------+-----+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
_hoodie_file_name| row| eventTime|preComb|ppath|
+-------------------+--------------------+------------------+----------------------+--------------------+----+-------------+-------+-----+
| 20220211102107283|20220211102107283...| row1|
part1/1970-01-01|dfc23d4b-8177-4fa...|row1|1578283932000| part1| 0|
| 20220211102107283|20220211102107283...| row2|
part1/1970-01-01|dfc23d4b-8177-4fa...|row2|1578283942000| part1| 0|
+-------------------+--------------------+------------------+----------------------+--------------------+----+-------------+-------+-----+
```
specifically values for _hoodie_partition_path are
part1/1970-01-01
2: if you disable hive style partitioning, you may not see the "fieldname=".
But if you want to enable it, don't think hudi allows changing the fieldname
for partition paths.
3: I am not sure on how to leverage partition pruning for custom key gen
based tables. @xushiyan @YannByron @bhasudha @codope : do you folks have any
pointers here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]