quenlang commented on issue #7297: thetaSketch aggrgator handle null or "" into 
unexpected value at ingesting 
URL: 
https://github.com/apache/incubator-druid/issues/7297#issuecomment-475315491
 
 
   @AlexanderSaydakov 
   Hello, I had made a deep verification and I think this is a bug probably. I 
had reproduced in druid 0.13.0 and druid 0.14.0-rc1.
   I had put some datasets in my git repo 
(```[email protected]:quenlang/thetaSketch_verification.git``` which you can use 
to reproduce this problem, includes four files:
   - demo_dataset.json: contains all raw data
   - demo_sub_dataset.json: only contains partial raw data as oprt_id=13981 
which get from the command ```grep 13981 demo_dataset.json 
>demo_sub_dataset.json```
   - demo_datasource.json: datasource schema spec file 
   - query.json: a query body in JSON format which uses to perform a query
   
   First, I posted a supervisor task request with the 
```demo_datasource.json``` and ingested the raw data from the 
```demo_dataset.json``` by sending it to Kafka topic. end of the ingesting, I 
performed a query with the ```query.json```  and druid return to me:
   ```
   [ {
     "timestamp" : "2019-03-18T13:30:00.000Z",
     "result" : {
       "prefixSuccessBusiness_no" : 0.0
     }
   } ]
   ```
   This result is correct. but after the task handoff segments to the deep 
storage when the task duration over or gracefully shut down the supervisor, I 
launched the same query and druid return to me:
   ```
   [ {
     "timestamp" : "2019-03-18T13:30:00.000Z",
     "result" : {
       "prefixSuccessBusiness_no" : 73.0
     }
   } ]
   ```
   I do not send other raw data to Kafka topic except the 
```demo_datasource.json```, but the data changed after handoff segments. maybe 
some unexpected operations occur during handoff merge. But I am not sure.
   
   Then I posted a new supervisor task request with the 
```demo_datasource.json``` but changed the datasource name, and send the 
```demo_sub_dataset.json``` dataset to a new Kafka topic for supervisor task 
ingesting. no matter if the task handoff segments to the deep storage, druid 
always return the same correct result 0.0 to me when I performed the same 
query. the ```demo_datasource.json``` only contains raw data of oprt_id=13981 
which is a selector filter in my query. I am confused. why this sense is 
correct?
   
   The values of column prefix_success_business_no in raw data with the 
condition in my query are all empty strings. why the first sense has an error 
result after segments handoff? 
   
   Can you help me find out? Thank you so much!
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to