t0il3ts0ap opened a new issue #2535:
URL: https://github.com/apache/hudi/issues/2535
I have a `_deleted` column in my dataset which I am converting to
`_hoodie_is_deleted` using a transformer. The change is reflected in metastore
and s3 dataset.
But expected behavior is hard deletion instead of a soft deletion. The row
should not show up when making any query.
Attaching code for reference:
```
public class CustomTransformer implements Transformer {
public Dataset<Row> apply(JavaSparkContext javaSparkContext,
SparkSession sparkSession,
Dataset<Row> dataset, TypedProperties typedProperties) {
return dataset
.withColumnRenamed("__deleted", "_hoodie_is_deleted")
.drop("__op", "__source_ts_ms");
}
}
```
```
scala> val df =
spark.read.format("org.apache.hudi").load("s3://***************/delta-streamer-test/tables/accounts-data/default")
df: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string,
_hoodie_commit_seqno: string ... 11 more fields]
scala> df.show()
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-------------------+----------------+----------------+------------+------------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
_hoodie_file_name| id| username| password| email|
created_on| last_login| __lsn|_hoodie_is_deleted|
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-------------------+----------------+----------------+------------+------------------+
| 20210204174033| 20210204174033_0_6| 1|
default|848f7f69-be2e-498...| 1|some user|new pass
3|[email protected]|1612193554103104|1612460406978955|614115973352|
false|
| 20210204173646| 20210204173646_0_2| 8|
default|848f7f69-be2e-498...| 8| | | |
0| null|614054424744| true|
+-------------------+--------------------+------------------+----------------------+--------------------+---+---------+-----------+-------------------+----------------+----------------+------------+------------------+
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]