quenlang commented on issue #7297: thetaSketch aggrgator handle null or "" into unexpected value at ingesting URL: https://github.com/apache/incubator-druid/issues/7297#issuecomment-475315491 @AlexanderSaydakov Hello, I had made a deep verification and I think this is a bug probably. I had reproduced in druid 0.13.0 and druid 0.14.0-rc1. I had put some datasets in my git repo (```[email protected]:quenlang/thetaSketch_verification.git``` which you can use to reproduce this problem, includes four files: - demo_dataset.json: contains all raw data - demo_sub_dataset.json: only contains partial raw data as oprt_id=13981 which get from the command ```grep 13981 demo_dataset.json >demo_sub_dataset.json``` - demo_datasource.json: datasource schema spec file - query.json: a query body in JSON format which uses to perform a query First, I posted a supervisor task request with the ```demo_datasource.json``` and ingested the raw data from the ```demo_dataset.json``` by sending it to Kafka topic. end of the ingesting, I performed a query with the ```query.json``` and druid return to me: ``` [ { "timestamp" : "2019-03-18T13:30:00.000Z", "result" : { "prefixSuccessBusiness_no" : 0.0 } } ] ``` This result is correct. but after the task handoff segments to the deep storage when the task duration over or gracefully shut down the supervisor, I launched the same query and druid return to me: ``` [ { "timestamp" : "2019-03-18T13:30:00.000Z", "result" : { "prefixSuccessBusiness_no" : 73.0 } } ] ``` I do not send other raw data to Kafka topic except the ```demo_datasource.json```, but the data changed after handoff segments. maybe some unexpected operations occur during handoff merge. But I am not sure. Then I posted a new supervisor task request with the ```demo_datasource.json``` but changed the datasource name, and send the ```demo_sub_dataset.json``` dataset to a new Kafka topic for supervisor task ingesting. no matter if the task handoff segments to the deep storage, druid always return the same correct result 0.0 to me when I performed the same query. the ```demo_datasource.json``` only contains raw data of oprt_id=13981 which is a selector filter in my query. I am confused. why this sense is correct? The values of column prefix_success_business_no in raw data with the condition in my query are all empty strings. why the first sense has an error result after segments handoff? Can you help me find out? Thank you so much!
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
