codope edited a comment on issue #3336:
URL: https://github.com/apache/hudi/issues/3336#issuecomment-888166578


   @mithalee Can you try with the latest master branch? I built the master code 
and tried to reproduce the scenario in a local docker environment. It runs 
fine. For example, after first ingest, you can see `_hoodie_is_deleted` is 
false for both timestamp and after second ingest (in which I set 
`_hoodie_is_deleted` to true for a timestamp), it is present only for one 
timestamp.
   ```
   // after first ingest
   scala> spark.sql("select symbol, ts, _hoodie_is_deleted from stock_ticks_cow 
WHERE symbol = 'MSFT'").show(100, false)
   +------+-------------------+------------------+
   |symbol|ts                 |_hoodie_is_deleted|
   +------+-------------------+------------------+
   |MSFT  |2018-08-31 09:59:00|false             |
   |MSFT  |2018-08-31 10:29:00|false             |
   +------+-------------------+------------------+
   
   // after second ingest
   scala> spark.sql("select symbol, ts, _hoodie_is_deleted from stock_ticks_cow 
WHERE symbol = 'MSFT'").show(100, false)
   +------+-------------------+------------------+
   |symbol|ts                 |_hoodie_is_deleted|
   +------+-------------------+------------------+
   |MSFT  |2018-08-31 09:59:00|false             |
   +------+-------------------+------------------+
   ```
   
   My schema is similar to 
[this](https://github.com/apache/hudi/blob/master/docker/demo/config/schema.avsc)
 except that I added `_hoodie_is_deleted` field with default false.
   
   FYI, my spark-submit command is same as [mentioned 
here](https://hudi.apache.org/docs/docker_demo.html#step-2-incrementally-ingest-data-from-kafka-topic).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to