mithalee commented on issue #3336:
URL: https://github.com/apache/hudi/issues/3336#issuecomment-888699402
> @mithalee Can you try with the latest master branch? I built the master
code and tried to reproduce the scenario in a local docker environment. It runs
fine. For example, after first ingest, you can see `_hoodie_is_deleted` is
false for both timestamp and after second ingest (in which I set
`_hoodie_is_deleted` to true for a timestamp), it is present only for one
timestamp.
>
> ```
> // after first ingest
> scala> spark.sql("select symbol, ts, _hoodie_is_deleted from
stock_ticks_cow WHERE symbol = 'MSFT'").show(100, false)
> +------+-------------------+------------------+
> |symbol|ts |_hoodie_is_deleted|
> +------+-------------------+------------------+
> |MSFT |2018-08-31 09:59:00|false |
> |MSFT |2018-08-31 10:29:00|false |
> +------+-------------------+------------------+
>
> // after second ingest
> scala> spark.sql("select symbol, ts, _hoodie_is_deleted from
stock_ticks_cow WHERE symbol = 'MSFT'").show(100, false)
> +------+-------------------+------------------+
> |symbol|ts |_hoodie_is_deleted|
> +------+-------------------+------------------+
> |MSFT |2018-08-31 09:59:00|false |
> +------+-------------------+------------------+
> ```
>
> My schema is similar to
[this](https://github.com/apache/hudi/blob/master/docker/demo/config/schema.avsc)
except that I added `_hoodie_is_deleted` field with default false.
>
> FYI, my spark-submit command is same as [mentioned
here](https://hudi.apache.org/docs/docker_demo.html#step-2-incrementally-ingest-data-from-kafka-topic).
Sure.I will try and get back to you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]